Search | arXiv e-print repository

LASIGE and UNICAGE solution to the NASA LitCoin NLP Competition

Authors: Pedro Ruas, Diana F. Sousa, André Neves, Carlos Cruz, Francisco M. Couto

Abstract: Biomedical Natural Language Processing (NLP) tends to become cumbersome for most researchers, frequently due to the amount and heterogeneity of text to be processed. To address this challenge, the industry is continuously develo** highly efficient tools and creating more flexible engineering solutions. This work presents the integration between industry data engineering solutions for efficient d… ▽ More Biomedical Natural Language Processing (NLP) tends to become cumbersome for most researchers, frequently due to the amount and heterogeneity of text to be processed. To address this challenge, the industry is continuously develo** highly efficient tools and creating more flexible engineering solutions. This work presents the integration between industry data engineering solutions for efficient data processing and academic systems developed for Named Entity Recognition (LasigeUnicage\_NER) and Relation Extraction (BiOnt). Our design reflects an integration of those components with external knowledge in the form of additional training data from other datasets and biomedical ontologies. We used this pipeline in the 2022 LitCoin NLP Challenge, where our team LasigeUnicage was awarded the 7th Prize out of approximately 200 participating teams, reflecting a successful collaboration between the academia (LASIGE) and the industry (Unicage). The software supporting this work is available at \url{https://github.com/lasigeBioTM/Litcoin-Lasige_Unicage}. △ Less

Submitted 10 August, 2023; originally announced August 2023.

arXiv:2212.13656 [pdf, other]

Smart meter data processing: a showcase for simple and efficient textual processing

Authors: Miguel Ferreira, André Neves, Rodrigo Gorjão, Carlos Cruz, Miguel L. Pardal

Abstract: The increase in the production and collection of data from devices is an ongoing trend due to the roll-out of more cyber-physical applications. Smart meters, because of their importance in power grids, are a class of such devices whose produced data requires meticulous processing. In this paper, we use Unicage, a data processing system based on classic Unix shell scripting, that delivers excellent… ▽ More The increase in the production and collection of data from devices is an ongoing trend due to the roll-out of more cyber-physical applications. Smart meters, because of their importance in power grids, are a class of such devices whose produced data requires meticulous processing. In this paper, we use Unicage, a data processing system based on classic Unix shell scripting, that delivers excellent performance in a simple package. We use this methodology to process smart meter data in XML format, subjected to the constraints posed by a real use case. We develop a solution that parses, validates and performs a simple aggregation of 27 million XML files in less than 10 minutes. We present a study of the solution as well as the benefits of its adoption. △ Less

Submitted 27 December, 2022; originally announced December 2022.

Comments: 11 pages, 5 figures, 1 table, 9 listings. Accepted after review for the 1st Workshop on High-Performance and Reliable Big Data (HPBD 2021), which was held virtually on September 20th 2021, and was co-located with the 40th International Symposium on Reliable Distributed Systems (SRDS 2021)

arXiv:2206.04744 [pdf, other]

doi 10.1080/10447318.2022.2075601

The Developers' Design Thinking Toolbox in Hackathons: A Study on the Recurring Design Methods in Software Development Marathons

Authors: Kiev Gama, George Valença, Pedro Alessio, Rafael Formiga, André Neves, Nycolas Lacerda

Abstract: Hackathons are time-bounded collaborative events of intense teamwork to build prototypes usually in the form of software, aiming to specific challenges proposed by the organizers. These events became a widespread practice in the IT industry, universities and many other scenarios, as a result of a growing open-innovation trend in the last decade. Since the main deliverable of these events is a demo… ▽ More Hackathons are time-bounded collaborative events of intense teamwork to build prototypes usually in the form of software, aiming to specific challenges proposed by the organizers. These events became a widespread practice in the IT industry, universities and many other scenarios, as a result of a growing open-innovation trend in the last decade. Since the main deliverable of these events is a demonstrable version of an idea, such as early hardware or software prototypes, the short time frame requires participants to quickly understand the proposed challenge or even identify issues related to a given domain. To create solutions, teams follow an ad-hoc but effective design approach, that many times seems informal since the background of the participants is rather centered on technical aspects (e.g., web and mobile programming) and does not involve any training in Design Thinking. To understand this creative process, we conducted 37 interviews (32 hackathons winners and 5 hackathon organizers) with people from 16 countries. We aimed to identify the design processes and recurring design methods applied by winners in these events. Also, we conducted a focus group with 8 people experienced in hackathons (participants and organizers) to discuss our findings. Our analysis revealed that although hackathon winners with IT background have no formal training on Design Thinking, they are aware of many design methods, typically following a sequence of phases that involve divergent and convergent thinking to explore the problem space and propose alternatives in a solution space, which is the rationale behind Design Thinking. We derived a set of recommendations based on design strategies that seem to lead to successful hackathon participation. These recommendations can also be useful to organizers who intend to enhance the experience of newcomers in hackathons. △ Less

Submitted 9 June, 2022; originally announced June 2022.

Comments: To be published in the International Journal of Human-Computer Interaction (Taylor and Francis)

arXiv:2205.03289 [pdf, other]

Learning to Cooperate with Completely Unknown Teammates

Authors: Alexandre Neves, Alberto Sardinha

Abstract: A key goal of ad hoc teamwork is to develop a learning agent that cooperates with unknown teams, without resorting to any pre-coordination protocol. Despite a vast number of ad hoc teamwork algorithms in the literature, most of them cannot address the problem of learning to cooperate with a completely unknown team, unless it learns from scratch. This article presents a novel approach that uses tra… ▽ More A key goal of ad hoc teamwork is to develop a learning agent that cooperates with unknown teams, without resorting to any pre-coordination protocol. Despite a vast number of ad hoc teamwork algorithms in the literature, most of them cannot address the problem of learning to cooperate with a completely unknown team, unless it learns from scratch. This article presents a novel approach that uses transfer learning alongside the state-of-the-art PLASTIC-Policy to adapt to completely unknown teammates quickly. We test our solution within the Half Field Offense simulator with five different teammates. The teammates were designed independently by developers from different countries and at different times. Our empirical evaluation shows that it is advantageous for an ad hoc agent to leverage its past knowledge when adapting to a new team instead of learning how to cooperate with it from scratch. △ Less

Submitted 6 May, 2022; originally announced May 2022.

Comments: 13 pages, 1 figure

ACM Class: I.2.6

arXiv:2006.05569 [pdf, other]

A gaze driven fast-forward method for first-person videos

Authors: Alan Carvalho Neves, Michel Melo Silva, Mario Fernando Montenegro Campos, Erickson Rangel Nascimento

Abstract: The growing data sharing and life-logging cultures are driving an unprecedented increase in the amount of unedited First-Person Videos. In this paper, we address the problem of accessing relevant information in First-Person Videos by creating an accelerated version of the input video and emphasizing the important moments to the recorder. Our method is based on an attention model driven by gaze and… ▽ More The growing data sharing and life-logging cultures are driving an unprecedented increase in the amount of unedited First-Person Videos. In this paper, we address the problem of accessing relevant information in First-Person Videos by creating an accelerated version of the input video and emphasizing the important moments to the recorder. Our method is based on an attention model driven by gaze and visual scene analysis that provides a semantic score of each frame of the input video. We performed several experimental evaluations on publicly available First-Person Videos datasets. The results show that our methodology can fast-forward videos emphasizing moments when the recorder visually interact with scene components while not including monotonous clips. △ Less

Submitted 9 June, 2020; originally announced June 2020.

Comments: Accepted for presentation at EPIC@CVPR2020 workshop

arXiv:2003.09017 [pdf, other]

Xtreaming: an incremental multidimensional projection technique and its application to streaming data

Authors: Tácito T. A. T. Neves, Rafael M. Martins, Danilo B. Coimbra, Kostiantyn Kucher, Andreas Kerren, Fernando V. Paulovich

Abstract: Streaming data applications are becoming more common due to the ability of different information sources to continuously capture or produce data, such as sensors and social media. Despite recent advances, most visualization approaches, in particular, multidimensional projection or dimensionality reduction techniques, cannot be directly applied in such scenarios due to the transient nature of strea… ▽ More Streaming data applications are becoming more common due to the ability of different information sources to continuously capture or produce data, such as sensors and social media. Despite recent advances, most visualization approaches, in particular, multidimensional projection or dimensionality reduction techniques, cannot be directly applied in such scenarios due to the transient nature of streaming data. Currently, only a few methods address this limitation using online or incremental strategies, continuously processing data, and updating the visualization. Despite their relative success, most of them impose the need for storing and accessing the data multiple times, not being appropriate for streaming where data continuously grow. Others do not impose such requirements but are not capable of updating the position of the data already projected, potentially resulting in visual artifacts. In this paper, we present Xtreaming, a novel incremental projection technique that continuously updates the visual representation to reflect new emerging structures or patterns without visiting the multidimensional data more than once. Our tests show that Xtreaming is competitive in terms of global distance preservation if compared to other streaming and incremental techniques, but it is orders of magnitude faster. To the best of our knowledge, it is the first methodology that is capable of evolving a projection to faithfully represent new emerging structures without the need to store all data, providing reliable results for efficiently and effectively projecting streaming data. △ Less

Submitted 7 March, 2020; originally announced March 2020.

Comments: 12 pages, 11 figures

arXiv:1912.12655 [pdf, other]

Personalizing Fast-Forward Videos Based on Visual and Textual Features from Social Network

Authors: Washington L. S. Ramos, Michel M. Silva, Edson R. Araujo, Alan C. Neves, Erickson R. Nascimento

Abstract: The growth of Social Networks has fueled the habit of people logging their day-to-day activities, and long First-Person Videos (FPVs) are one of the main tools in this new habit. Semantic-aware fast-forward methods are able to decrease the watch time and select meaningful moments, which is key to increase the chances of these videos being watched. However, these methods can not handle semantics in… ▽ More The growth of Social Networks has fueled the habit of people logging their day-to-day activities, and long First-Person Videos (FPVs) are one of the main tools in this new habit. Semantic-aware fast-forward methods are able to decrease the watch time and select meaningful moments, which is key to increase the chances of these videos being watched. However, these methods can not handle semantics in terms of personalization. In this work, we present a new approach to automatically creating personalized fast-forward videos for FPVs. Our approach explores the availability of text-centric data from the user's social networks such as status updates to infer her/his topics of interest and assigns scores to the input frames according to her/his preferences. Extensive experiments are conducted on three different datasets with simulated and real-world users as input, achieving an average F1 score of up to 12.8 percentage points higher than the best competitors. We also present a user study to demonstrate the effectiveness of our method. △ Less

Submitted 29 December, 2019; originally announced December 2019.

arXiv:1912.10799 [pdf, other]

Deteção de estruturas permanentes a partir de dados de séries temporais Sentinel 1 e 2

Authors: André Neves, Carlos Damásio, João Pires, Fernando Birra

Abstract: Map** structures such as settlements, roads, individual houses and any other types of artificial structures is of great importance for the analysis of urban growth, masking, image alignment and, especially in the studied use case, the definition of Fuel Management Networks (FGC), which protect buildings from forest fires. Current cartography has a low generation frequency and their resolution ma… ▽ More Map** structures such as settlements, roads, individual houses and any other types of artificial structures is of great importance for the analysis of urban growth, masking, image alignment and, especially in the studied use case, the definition of Fuel Management Networks (FGC), which protect buildings from forest fires. Current cartography has a low generation frequency and their resolution may not be suitable for extracting small structures such as small settlements or roads, which may lack forest fire protection. In this paper, we use time series data, extracted from Sentinel-1 and 2 constellations, over Santarém, Mação, to explore the detection of permanent structures at a resolution of 10 by 10 meters. For this purpose, a XGBoost classification model is trained with 133 attributes extracted from the time series from all the bands, including normalized radiometric indices. The results show that the use of time series data increases the accuracy of the extraction of permanent structures when compared using only static data, using multitemporal data also increases the number of detected roads. In general, the final result has a permanent structure map** with a higher resolution than state of the art settlement maps, small structures and roads are also more accurately represented. Regarding the use case, by using our final map for the creation of FGC it is possible to simplify and accelerate the process of delimitation of the official FGC. △ Less

Submitted 11 December, 2019; originally announced December 2019.

Comments: 12 pages, in Portuguese, 7 figures, conference: INForum 2019

arXiv:1709.08689 [pdf]

doi 10.1109/E3S.2017.8246198

The Benefits of Low Operating Voltage Devices to the Energy Efficiency of Parallel Systems

Authors: Samuel Xavier-de-Souza, Eduardo A. Neves, Alex F. A. Furtunato, Luiz F. Q. Silveira, Kyriakos Georgiou, Kerstin I. Eder

Abstract: Programmable circuits such as general-purpose processors or FPGAs have their end-user energy efficiency strongly dependent on the program that they execute. Ultimately, it is the programmer's ability to code and, in the case of general purpose processors, the compiler's ability to translate source code into a sequence of native instructions that make the circuit deliver the expected performance to… ▽ More Programmable circuits such as general-purpose processors or FPGAs have their end-user energy efficiency strongly dependent on the program that they execute. Ultimately, it is the programmer's ability to code and, in the case of general purpose processors, the compiler's ability to translate source code into a sequence of native instructions that make the circuit deliver the expected performance to the end user. This way, the benefits of energy-efficient circuits build upon energy-efficient devices could be obfuscated by poorly written software. Clearly, having well-written software running on conventional circuits is no better in terms of energy efficiency than having poorly written software running on energy-efficient circuits. Therefore, to get the most out of the energy-saving capabilities of programmable circuits that support low voltage operating modes, it is necessary to address software issues that might work against the benefits of operating in such modes. △ Less

Submitted 13 August, 2017; originally announced September 2017.

Report number: LAPPS2017_001

Showing 1–9 of 9 results for author: Neves, A