-
Development of a Real-Time Simulator Using EMTP-ATP Foreign models for Testing Relays
Authors:
Renzo Fabian,
Rommel Romero
Abstract:
This paper reports the PC implementation of a real-time simulator for testing protective relays, based on the widely used EMTP-ATP software. The proposed simulator was implemented using the GNU/Linux OS with a real-time kernel. In order to generate the waveforms corresponding to simulated voltages and currents, a PCI card was used. This card also includes digital I/O interface. Via foreign models…
▽ More
This paper reports the PC implementation of a real-time simulator for testing protective relays, based on the widely used EMTP-ATP software. The proposed simulator was implemented using the GNU/Linux OS with a real-time kernel. In order to generate the waveforms corresponding to simulated voltages and currents, a PCI card was used. This card also includes digital I/O interface. Via foreign models programmed in standard C, ATP was recompiled to include waveform generation at each simulation time step and digital I/O. Additionally, an IEC-61850 open source library was used, in order to use Sampled Values and GOOSE protocols. The resulting tool is a real-time simulator that can interact with protective relays by means of HiL tests. The performance of the simulator was analyzed via an interaction with an actual relay.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
Gaussian Embedding of Temporal Networks
Authors:
Raphaël Romero,
Jefrey Lijffijt,
Riccardo Rastelli,
Marco Corneli,
Tijl De Bie
Abstract:
Representing the nodes of continuous-time temporal graphs in a low-dimensional latent space has wide-ranging applications, from prediction to visualization. Yet, analyzing continuous-time relational data with timestamped interactions introduces unique challenges due to its sparsity. Merely embedding nodes as trajectories in the latent space overlooks this sparsity, emphasizing the need to quantify…
▽ More
Representing the nodes of continuous-time temporal graphs in a low-dimensional latent space has wide-ranging applications, from prediction to visualization. Yet, analyzing continuous-time relational data with timestamped interactions introduces unique challenges due to its sparsity. Merely embedding nodes as trajectories in the latent space overlooks this sparsity, emphasizing the need to quantify uncertainty around the latent positions. In this paper, we propose TGNE (\textbf{T}emporal \textbf{G}aussian \textbf{N}etwork \textbf{E}mbedding), an innovative method that bridges two distinct strands of literature: the statistical analysis of networks via Latent Space Models (LSM)\cite{Hoff2002} and temporal graph machine learning. TGNE embeds nodes as piece-wise linear trajectories of Gaussian distributions in the latent space, capturing both structural information and uncertainty around the trajectories. We evaluate TGNE's effectiveness in reconstructing the original graph and modelling uncertainty. The results demonstrate that TGNE generates competitive time-varying embedding locations compared to common baselines for reconstructing unobserved edge interactions based on observed edges. Furthermore, the uncertainty estimates align with the time-varying degree distribution in the network, providing valuable insights into the temporal dynamics of the graph. To facilitate reproducibility, we provide an open-source implementation of TGNE at \url{https://github.com/aida-ugent/tgne}.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
Exploring the Performance of Continuous-Time Dynamic Link Prediction Algorithms
Authors:
Raphaël Romero,
Maarten Buyl,
Tijl De Bie,
Jefrey Lijffijt
Abstract:
Dynamic Link Prediction (DLP) addresses the prediction of future links in evolving networks. However, accurately portraying the performance of DLP algorithms poses challenges that might impede progress in the field. Importantly, common evaluation pipelines usually calculate ranking or binary classification metrics, where the scores of observed interactions (positives) are compared with those of ra…
▽ More
Dynamic Link Prediction (DLP) addresses the prediction of future links in evolving networks. However, accurately portraying the performance of DLP algorithms poses challenges that might impede progress in the field. Importantly, common evaluation pipelines usually calculate ranking or binary classification metrics, where the scores of observed interactions (positives) are compared with those of randomly generated ones (negatives). However, a single metric is not sufficient to fully capture the differences between DLP algorithms, and is prone to overly optimistic performance evaluation. Instead, an in-depth evaluation should reflect performance variations across different nodes, edges, and time segments. In this work, we contribute tools to perform such a comprehensive evaluation. (1) We propose Birth-Death diagrams, a simple but powerful visualization technique that illustrates the effect of time-based train-test splitting on the difficulty of DLP on a given dataset. (2) We describe an exhaustive taxonomy of negative sampling methods that can be used at evaluation time. (3) We carry out an empirical study of the effect of the different negative sampling strategies. Our comparison between heuristics and state-of-the-art memory-based methods on various real-world datasets confirms a strong effect of using different negative sampling strategies on the test Area Under the Curve (AUC). Moreover, we conduct a visual exploration of the prediction, with additional insights on which different types of errors are prominent over time.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
Towards Explainable Test Case Prioritisation with Learning-to-Rank Models
Authors:
Aurora Ramírez,
Mario Berrios,
José Raúl Romero,
Robert Feldt
Abstract:
Test case prioritisation (TCP) is a critical task in regression testing to ensure quality as software evolves. Machine learning has become a common way to achieve it. In particular, learning-to-rank (LTR) algorithms provide an effective method of ordering and prioritising test cases. However, their use poses a challenge in terms of explainability, both globally at the model level and locally for p…
▽ More
Test case prioritisation (TCP) is a critical task in regression testing to ensure quality as software evolves. Machine learning has become a common way to achieve it. In particular, learning-to-rank (LTR) algorithms provide an effective method of ordering and prioritising test cases. However, their use poses a challenge in terms of explainability, both globally at the model level and locally for particular results. Here, we present and discuss scenarios that require different explanations and how the particularities of TCP (multiple builds over time, test case and test suite variations, etc.) could influence them. We include a preliminary experiment to analyse the similarity of explanations, showing that they do not only vary depending on test case-specific predictions, but also on the relative ranks.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
Explainable LightGBM Approach for Predicting Myocardial Infarction Mortality
Authors:
Ana Letícia Garcez Vicente,
Roseval Donisete Malaquias Junior,
Roseli A. F. Romero
Abstract:
Myocardial Infarction is a main cause of mortality globally, and accurate risk prediction is crucial for improving patient outcomes. Machine Learning techniques have shown promise in identifying high-risk patients and predicting outcomes. However, patient data often contain vast amounts of information and missing values, posing challenges for feature selection and imputation methods. In this artic…
▽ More
Myocardial Infarction is a main cause of mortality globally, and accurate risk prediction is crucial for improving patient outcomes. Machine Learning techniques have shown promise in identifying high-risk patients and predicting outcomes. However, patient data often contain vast amounts of information and missing values, posing challenges for feature selection and imputation methods. In this article, we investigate the impact of the data preprocessing task and compare three ensembles boosted tree methods to predict the risk of mortality in patients with myocardial infarction. Further, we use the Tree Shapley Additive Explanations method to identify relationships among all the features for the performed predictions, leveraging the entirety of the available data in the analysis. Notably, our approach achieved a superior performance when compared to other existing machine learning approaches, with an F1-score of 91,2% and an accuracy of 91,8% for LightGBM without data preprocessing.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
Postoperative glioblastoma segmentation: Development of a fully automated pipeline using deep convolutional neural networks and comparison with currently available models
Authors:
Santiago Cepeda,
Roberto Romero,
Daniel Garcia-Perez,
Guillermo Blasco,
Luigi Tommaso Luppino,
Samuel Kuttner,
Ignacio Arrese,
Ole Solheim,
Live Eikenes,
Anna Karlberg,
Angel Perez-Nunez,
Trinidad Escudero,
Roberto Hornero,
Rosario Sarabia
Abstract:
Accurately assessing tumor removal is paramount in the management of glioblastoma. We developed a pipeline using MRI scans and neural networks to segment tumor subregions and the surgical cavity in postoperative images. Our model excels in accurately classifying the extent of resection, offering a valuable tool for clinicians in assessing treatment effectiveness.
Accurately assessing tumor removal is paramount in the management of glioblastoma. We developed a pipeline using MRI scans and neural networks to segment tumor subregions and the surgical cavity in postoperative images. Our model excels in accurately classifying the extent of resection, offering a valuable tool for clinicians in assessing treatment effectiveness.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
Juru: Legal Brazilian Large Language Model from Reputable Sources
Authors:
Roseval Malaquias Junior,
Ramon Pires,
Roseli Romero,
Rodrigo Nogueira
Abstract:
The high computational cost associated with pretraining large language models limits their research. Two strategies have emerged to address this issue: domain specialization and pretraining with high-quality data. To explore these strategies, we specialized the Sabiá-2 Small model with 1.9 billion unique tokens from reputable Brazilian legal sources and conducted few-shot evaluations on legal and…
▽ More
The high computational cost associated with pretraining large language models limits their research. Two strategies have emerged to address this issue: domain specialization and pretraining with high-quality data. To explore these strategies, we specialized the Sabiá-2 Small model with 1.9 billion unique tokens from reputable Brazilian legal sources and conducted few-shot evaluations on legal and general knowledge exams. Our model, Juru, demonstrates the benefits of domain specialization with a reduced amount of pretraining data. However, this specialization comes at the expense of degrading performance in other knowledge areas within the same language. This study contributes to the growing body of scientific evidence showing that pretraining data selection may enhance the performance of large language models, enabling the exploration of these models at a lower cost.
△ Less
Submitted 26 March, 2024;
originally announced March 2024.
-
JCLEC-MO: a Java suite for solving many-objective optimization engineering problems
Authors:
Aurora Ramírez,
José Raúl Romero,
Carlos García-Martínez,
Sebastián Ventura
Abstract:
Although metaheuristics have been widely recognized as efficient techniques to solve real-world optimization problems, implementing them from scratch remains difficult for domain-specific experts without programming skills. In this scenario, metaheuristic optimization frameworks are a practical alternative as they provide a variety of algorithms composed of customized elements, as well as experime…
▽ More
Although metaheuristics have been widely recognized as efficient techniques to solve real-world optimization problems, implementing them from scratch remains difficult for domain-specific experts without programming skills. In this scenario, metaheuristic optimization frameworks are a practical alternative as they provide a variety of algorithms composed of customized elements, as well as experimental support. Recently, many engineering problems require to optimize multiple or even many objectives, increasing the interest in appropriate metaheuristic algorithms and frameworks that might integrate new specific requirements while maintaining the generality and reusability principles they were conceived for. Based on this idea, this paper introduces JCLEC-MO, a Java framework for both multi- and many-objective optimization that enables engineers to apply, or adapt, a great number of multi-objective algorithms with little coding effort. A case study is developed and explained to show how JCLEC-MO can be used to address many-objective engineering problems, often requiring the inclusion of domain-specific elements, and to analyze experimental outcomes by means of conveniently connected R utilities.
△ Less
Submitted 28 February, 2024;
originally announced February 2024.
-
Evolving machine learning workflows through interactive AutoML
Authors:
Rafael Barbudo,
Aurora Ramírez,
José Raúl Romero
Abstract:
Automatic workflow composition (AWC) is a relevant problem in automated machine learning (AutoML) that allows finding suitable sequences of preprocessing and prediction models together with their optimal hyperparameters. This problem can be solved using evolutionary algorithms and, in particular, grammar-guided genetic programming (G3P). Current G3P approaches to AWC define a fixed grammar that fo…
▽ More
Automatic workflow composition (AWC) is a relevant problem in automated machine learning (AutoML) that allows finding suitable sequences of preprocessing and prediction models together with their optimal hyperparameters. This problem can be solved using evolutionary algorithms and, in particular, grammar-guided genetic programming (G3P). Current G3P approaches to AWC define a fixed grammar that formally specifies how workflow elements can be combined and which algorithms can be included. In this paper we present \ourmethod, an interactive G3P algorithm that allows users to dynamically modify the grammar to prune the search space and focus on their regions of interest. Our proposal is the first to combine the advantages of a G3P method with ideas from interactive optimisation and human-guided machine learning, an area little explored in the context of AutoML. To evaluate our approach, we present an experimental study in which 20 participants interact with \ourmethod to evolve workflows according to their preferences. Our results confirm that the collaboration between \ourmethod and humans allows us to find high-performance workflows in terms of accuracy that require less tuning time than those found without human intervention.
△ Less
Submitted 28 February, 2024;
originally announced February 2024.
-
Grammar-based evolutionary approach for automated workflow composition with domain-specific operators and ensemble diversity
Authors:
Rafael Barbudo,
Aurora Ramírez,
José Raúl Romero
Abstract:
The process of extracting valuable and novel insights from raw data involves a series of complex steps. In the realm of Automated Machine Learning (AutoML), a significant research focus is on automating aspects of this process, specifically tasks like selecting algorithms and optimising their hyper-parameters. A particularly challenging task in AutoML is automatic workflow composition (AWC). AWC a…
▽ More
The process of extracting valuable and novel insights from raw data involves a series of complex steps. In the realm of Automated Machine Learning (AutoML), a significant research focus is on automating aspects of this process, specifically tasks like selecting algorithms and optimising their hyper-parameters. A particularly challenging task in AutoML is automatic workflow composition (AWC). AWC aims to identify the most effective sequence of data preprocessing and ML algorithms, coupled with their best hyper-parameters, for a specific dataset. However, existing AWC methods are limited in how many and in what ways they can combine algorithms within a workflow.
Addressing this gap, this paper introduces EvoFlow, a grammar-based evolutionary approach for AWC. EvoFlow enhances the flexibility in designing workflow structures, empowering practitioners to select algorithms that best fit their specific requirements. EvoFlow stands out by integrating two innovative features. First, it employs a suite of genetic operators, designed specifically for AWC, to optimise both the structure of workflows and their hyper-parameters. Second, it implements a novel updating mechanism that enriches the variety of predictions made by different workflows. Promoting this diversity helps prevent the algorithm from overfitting. With this aim, EvoFlow builds an ensemble whose workflows differ in their misclassified instances.
To evaluate EvoFlow's effectiveness, we carried out empirical validation using a set of classification benchmarks. We begin with an ablation study to demonstrate the enhanced performance attributable to EvoFlow's unique components. Then, we compare EvoFlow with other AWC approaches, encompassing both evolutionary and non-evolutionary techniques. Our findings show that EvoFlow's specialised genetic operators and updating mechanism substantially outperform current leading methods[..]
△ Less
Submitted 3 February, 2024;
originally announced February 2024.
-
Artificial intelligence to automate the systematic review of scientific literature
Authors:
José de la Torre-López,
Aurora Ramírez,
José Raúl Romero
Abstract:
Artificial intelligence (AI) has acquired notorious relevance in modern computing as it effectively solves complex tasks traditionally done by humans. AI provides methods to represent and infer knowledge, efficiently manipulate texts and learn from vast amount of data. These characteristics are applicable in many activities that human find laborious or repetitive, as is the case of the analysis of…
▽ More
Artificial intelligence (AI) has acquired notorious relevance in modern computing as it effectively solves complex tasks traditionally done by humans. AI provides methods to represent and infer knowledge, efficiently manipulate texts and learn from vast amount of data. These characteristics are applicable in many activities that human find laborious or repetitive, as is the case of the analysis of scientific literature. Manually preparing and writing a systematic literature review (SLR) takes considerable time and effort, since it requires planning a strategy, conducting the literature search and analysis, and reporting the findings. Depending on the area under study, the number of papers retrieved can be of hundreds or thousands, meaning that filtering those relevant ones and extracting the key information becomes a costly and error-prone process. However, some of the involved tasks are repetitive and, therefore, subject to automation by means of AI. In this paper, we present a survey of AI techniques proposed in the last 15 years to help researchers conduct systematic analyses of scientific literature. We describe the tasks currently supported, the types of algorithms applied, and available tools proposed in 34 primary studies. This survey also provides a historical perspective of the evolution of the field and the role that humans can play in an increasingly automated SLR process.
△ Less
Submitted 13 January, 2024;
originally announced January 2024.
-
InterEvo-TR: Interactive Evolutionary Test Generation With Readability Assessment
Authors:
Pedro Delgado-Pérez,
Aurora Ramírez,
Kevin J. Valle-Gómez,
Inmaculada Medina-Bulo,
José Raúl Romero
Abstract:
Automated test case generation has proven to be useful to reduce the usually high expenses of software testing. However, several studies have also noted the skepticism of testers regarding the comprehension of generated test suites when compared to manually designed ones. This fact suggests that involving testers in the test generation process could be helpful to increase their acceptance of autom…
▽ More
Automated test case generation has proven to be useful to reduce the usually high expenses of software testing. However, several studies have also noted the skepticism of testers regarding the comprehension of generated test suites when compared to manually designed ones. This fact suggests that involving testers in the test generation process could be helpful to increase their acceptance of automatically-produced test suites. In this paper, we propose incorporating interactive readability assessments made by a tester into EvoSuite, a widely-known evolutionary test generation tool. Our approach, InterEvo-TR, interacts with the tester at different moments during the search and shows different test cases covering the same coverage target for their subjective evaluation. The design of such an interactive approach involves a schedule of interaction, a method to diversify the selected targets, a plan to save and handle the readability values, and some mechanisms to customize the level of engagement in the revision, among other aspects. To analyze the potential and practicability of our proposal, we conduct a controlled experiment in which 39 participants, including academics, professional developers, and student collaborators, interact with InterEvo-TR. Our results show that the strategy to select and present intermediate results is effective for the purpose of readability assessment. Furthermore, the participants' actions and responses to a questionnaire allowed us to analyze the aspects influencing test code readability and the benefits and limitations of an interactive approach in the context of test case generation, paving the way for future developments based on interactivity.
△ Less
Submitted 13 January, 2024;
originally announced January 2024.
-
GEML: A Grammar-based Evolutionary Machine Learning Approach for Design-Pattern Detection
Authors:
Rafael Barbudo,
Aurora Ramírez,
Francisco Servant,
José Raúl Romero
Abstract:
Design patterns (DPs) are recognised as a good practice in software development. However, the lack of appropriate documentation often hampers traceability, and their benefits are blurred among thousands of lines of code. Automatic methods for DP detection have become relevant but are usually based on the rigid analysis of either software metrics or specific properties of the source code. We propos…
▽ More
Design patterns (DPs) are recognised as a good practice in software development. However, the lack of appropriate documentation often hampers traceability, and their benefits are blurred among thousands of lines of code. Automatic methods for DP detection have become relevant but are usually based on the rigid analysis of either software metrics or specific properties of the source code. We propose GEML, a novel detection approach based on evolutionary machine learning using software properties of diverse nature. Firstly, GEML makes use of an evolutionary algorithm to extract those characteristics that better describe the DP, formulated in terms of human-readable rules, whose syntax is conformant with a context-free grammar. Secondly, a rule-based classifier is built to predict whether new code contains a hidden DP implementation. GEML has been validated over five DPs taken from a public repository recurrently adopted by machine learning studies. Then, we increase this number up to 15 diverse DPs, showing its effectiveness and robustness in terms of detection capability. An initial parameter study served to tune a parameter setup whose performance guarantees the general applicability of this approach without the need to adjust complex parameters to a specific pattern. Finally, a demonstration tool is also provided.
△ Less
Submitted 13 January, 2024;
originally announced January 2024.
-
Interactive Multi-Objective Evolutionary Optimization of Software Architectures
Authors:
Aurora Ramírez,
José Raúl Romero,
Sebastián Ventura
Abstract:
While working on a software specification, designers usually need to evaluate different architectural alternatives to be sure that quality criteria are met. Even when these quality aspects could be expressed in terms of multiple software metrics, other qualitative factors cannot be numerically measured, but they are extracted from the engineer's know-how and prior experiences. In fact, detecting n…
▽ More
While working on a software specification, designers usually need to evaluate different architectural alternatives to be sure that quality criteria are met. Even when these quality aspects could be expressed in terms of multiple software metrics, other qualitative factors cannot be numerically measured, but they are extracted from the engineer's know-how and prior experiences. In fact, detecting not only strong but also weak points in the different solutions seems to fit better with the way humans make their decisions. Putting the human in the loop brings new challenges to the search-based software engineering field, especially for those human-centered activities within the early analysis phase. This paper explores how the interactive evolutionary computation can serve as a basis for integrating the human's judgment into the search process. An interactive approach is proposed to discover software architectures, in which both quantitative and qualitative criteria are applied to guide a multi-objective evolutionary algorithm. The obtained feedback is incorporated into the fitness function using architectural preferences allowing the algorithm to discern between promising and poor solutions. Experimentation with real users has revealed that the proposed interaction mechanism can effectively guide the search towards those regions of the search space that are of real interest to the expert.
△ Less
Submitted 8 January, 2024;
originally announced January 2024.
-
New Perspectives on the Evaluation of Link Prediction Algorithms for Dynamic Graphs
Authors:
Raphaël Romero,
Tijl De Bie,
Jefrey Lijffijt
Abstract:
There is a fast-growing body of research on predicting future links in dynamic networks, with many new algorithms. Some benchmark data exists, and performance evaluations commonly rely on comparing the scores of observed network events (positives) with those of randomly generated ones (negatives). These evaluation measures depend on both the predictive ability of the model and, crucially, the type…
▽ More
There is a fast-growing body of research on predicting future links in dynamic networks, with many new algorithms. Some benchmark data exists, and performance evaluations commonly rely on comparing the scores of observed network events (positives) with those of randomly generated ones (negatives). These evaluation measures depend on both the predictive ability of the model and, crucially, the type of negative samples used. Besides, as generally the case with temporal data, prediction quality may vary over time. This creates a complex evaluation space. In this work, we catalog the possibilities for negative sampling and introduce novel visualization methods that can yield insight into prediction performance and the dynamics of temporal networks. We leverage these visualization tools to investigate the effect of negative sampling on the predictive performance, at the node and edge level. We validate empirically, on datasets extracted from recent benchmarks that the error is typically not evenly distributed across different data segments. Finally, we argue that such visualization tools can serve as powerful guides to evaluate dynamic link prediction methods at different levels.
△ Less
Submitted 30 November, 2023;
originally announced November 2023.
-
Deep Learning Based Detection of Enlarged Perivascular Spaces on Brain MRI
Authors:
Tanweer Rashid,
Hangfan Liu,
Jeffrey B. Ware,
Karl Li,
Jose Rafael Romero,
Elyas Fadaee,
Ilya M. Nasrallah,
Saima Hilal,
R. Nick Bryan,
Timothy M. Hughes,
Christos Davatzikos,
Lenore Launer,
Sudha Seshadri,
Susan R. Heckbert,
Mohamad Habes
Abstract:
BACKGROUND AND PURPOSE: Deep learning has been demonstrated effective in many neuroimaging applications. However, in many scenarios, the number of imaging sequences capturing information related to small vessel disease lesions is insufficient to support data-driven techniques. Additionally, cohort-based studies may not always have the optimal or essential imaging sequences for accurate lesion dete…
▽ More
BACKGROUND AND PURPOSE: Deep learning has been demonstrated effective in many neuroimaging applications. However, in many scenarios, the number of imaging sequences capturing information related to small vessel disease lesions is insufficient to support data-driven techniques. Additionally, cohort-based studies may not always have the optimal or essential imaging sequences for accurate lesion detection. Therefore, it is necessary to determine which imaging sequences are crucial for precise detection. This study introduces a novel deep learning framework to detect enlarged perivascular spaces (ePVS) and aims to find the optimal combination of MRI sequences for deep learning-based quantification. MATERIALS AND METHODS: We implemented an effective lightweight U-Net adapted for ePVS detection and comprehensively investigated different combinations of information from SWI, FLAIR, T1-weighted (T1w), and T2-weighted (T2w) MRI sequences. The training data included 21 participants, which were randomly selected from the MESA cohort. Participants had ePVS 683 lesions on average. For T1w, T2w, and FLAIR images, the MESA study collected 3D isotropic MRI scans at six different sites with Siemens scanners. Our training data included participants from all these sites and all the scanner models, and the proposed model was applied to the whole brain instead of selective regions. RESULTS: The experimental results showed that T2w MRI is the most important for accurate ePVS detection, and the incorporation of SWI, FLAIR and T1w MRI in the deep neural network had minor improvements in accuracy and resulted in the highest sensitivity and precision (sensitivity =0.82, precision =0.83). The proposed method achieved comparable accuracy at a minimal time cost compared to manual reading.
△ Less
Submitted 14 October, 2022; v1 submitted 27 September, 2022;
originally announced September 2022.
-
Encoding High-level Quantum Programs as SZX-diagrams
Authors:
Augustin Borgna,
Rafael Romero
Abstract:
The Scalable ZX-calculus is a compact graphical language used to reason about linear maps between quantum states. These diagrams have multiple applications, but they frequently have to be constructed in a case-by-case basis. In this work we present a method to encode quantum programs implemented in a fragment of the linear dependently typed Proto-Quipper-D language as families of SZX-diagrams. We…
▽ More
The Scalable ZX-calculus is a compact graphical language used to reason about linear maps between quantum states. These diagrams have multiple applications, but they frequently have to be constructed in a case-by-case basis. In this work we present a method to encode quantum programs implemented in a fragment of the linear dependently typed Proto-Quipper-D language as families of SZX-diagrams. We define a subset of translatable Proto-Quipper-D programs and show that our procedure is able to encode non-trivial algorithms as diagrams that grow linearly on the size of the program.
△ Less
Submitted 16 November, 2023; v1 submitted 19 June, 2022;
originally announced June 2022.
-
Body Gesture Recognition to Control a Social Robot
Authors:
Javier Laplaza,
Joan Jaume Oliver,
Ramón Romero,
Alberto Sanfeliu,
Anaís Garrell
Abstract:
In this work, we propose a gesture based language to allow humans to interact with robots using their body in a natural way. We have created a new gesture detection model using neural networks and a custom dataset of humans performing a set of body gestures to train our network. Furthermore, we compare body gesture communication with other communication channels to acknowledge the importance of ad…
▽ More
In this work, we propose a gesture based language to allow humans to interact with robots using their body in a natural way. We have created a new gesture detection model using neural networks and a custom dataset of humans performing a set of body gestures to train our network. Furthermore, we compare body gesture communication with other communication channels to acknowledge the importance of adding this knowledge to robots. The presented approach is extensively validated in diverse simulations and real-life experiments with non-trained volunteers. This attains remarkable results and shows that it is a valuable framework for social robotics applications, such as human robot collaboration or human-robot interaction.
△ Less
Submitted 15 June, 2022;
originally announced June 2022.
-
Graph-Survival: A Survival Analysis Framework for Machine Learning on Temporal Networks
Authors:
Raphaël Romero,
Bo Kang,
Tijl De Bie
Abstract:
Continuous time temporal networks are attracting increasing attention due their omnipresence in real-world datasets and they manifold applications. While static network models have been successful in capturing static topological regularities, they often fail to model effects coming from the causal nature that explain the generation of networks. Exploiting the temporal aspect of networks has thus b…
▽ More
Continuous time temporal networks are attracting increasing attention due their omnipresence in real-world datasets and they manifold applications. While static network models have been successful in capturing static topological regularities, they often fail to model effects coming from the causal nature that explain the generation of networks. Exploiting the temporal aspect of networks has thus been the focus of various studies in the last decades.
We propose a framework for designing generative models for continuous time temporal networks. Assuming a first order Markov assumption on the edge-specific temporal point processes enables us to flexibly apply survival analysis models directly on the waiting time between events, while using time-varying history-based features as covariates for these predictions. This approach links the well-documented field of temporal networks analysis through multivariate point processes, with methodological tools adapted from survival analysis. We propose a fitting method for models within this framework, and an algorithm for simulating new temporal networks having desired properties. We evaluate our method on a downstream future link prediction task, and provide a qualitative assessment of the network simulations.
△ Less
Submitted 15 March, 2022; v1 submitted 14 March, 2022;
originally announced March 2022.
-
A Taxonomy of Information Attributes for Test Case Prioritisation: Applicability, Machine Learning
Authors:
Aurora Ramírez,
Robert Feldt,
José Raúl Romero
Abstract:
Most software companies have extensive test suites and re-run parts of them continuously to ensure recent changes have no adverse effects. Since test suites are costly to execute, industry needs methods for test case prioritisation (TCP). Recently, TCP methods use machine learning (ML) to exploit the information known about the system under test (SUT) and its test cases. However, the value added b…
▽ More
Most software companies have extensive test suites and re-run parts of them continuously to ensure recent changes have no adverse effects. Since test suites are costly to execute, industry needs methods for test case prioritisation (TCP). Recently, TCP methods use machine learning (ML) to exploit the information known about the system under test (SUT) and its test cases. However, the value added by ML-based TCP methods should be critically assessed with respect to the cost of collecting the information. This paper analyses two decades of TCP research, and presents a taxonomy of 91 information attributes that have been used. The attributes are classified with respect to their information sources and the characteristics of their extraction process. Based on this taxonomy, TCP methods validated with industrial data and those applying ML are analysed in terms of information availability, attribute combination and definition of data features suitable for ML. Relying on a high number of information attributes, assuming easy access to SUT code and simplified testing environments are identified as factors that might hamper industrial applicability of ML-based TCP. The TePIA taxonomy provides a reference framework to unify terminology and evaluate alternatives considering the cost-benefit of the information attributes.
△ Less
Submitted 16 January, 2022;
originally announced January 2022.
-
Large-scale Autonomous Flight with Real-time Semantic SLAM under Dense Forest Canopy
Authors:
Xu Liu,
Guilherme V. Nardari,
Fernando Cladera Ojeda,
Yuezhan Tao,
Alex Zhou,
Thomas Donnelly,
Chao Qu,
Steven W. Chen,
Roseli A. F. Romero,
Camillo J. Taylor,
Vijay Kumar
Abstract:
Semantic maps represent the environment using a set of semantically meaningful objects. This representation is storage-efficient, less ambiguous, and more informative, thus facilitating large-scale autonomy and the acquisition of actionable information in highly unstructured, GPS-denied environments. In this letter, we propose an integrated system that can perform large-scale autonomous flights an…
▽ More
Semantic maps represent the environment using a set of semantically meaningful objects. This representation is storage-efficient, less ambiguous, and more informative, thus facilitating large-scale autonomy and the acquisition of actionable information in highly unstructured, GPS-denied environments. In this letter, we propose an integrated system that can perform large-scale autonomous flights and real-time semantic map** in challenging under-canopy environments. We detect and model tree trunks and ground planes from LiDAR data, which are associated across scans and used to constrain robot poses as well as tree trunk models. The autonomous navigation module utilizes a multi-level planning and map** framework and computes dynamically feasible trajectories that lead the UAV to build a semantic map of the user-defined region of interest in a computationally and storage efficient manner. A drift-compensation mechanism is designed to minimize the odometry drift using semantic SLAM outputs in real time, while maintaining planner optimality and controller stability. This leads the UAV to execute its mission accurately and safely at scale.
△ Less
Submitted 15 August, 2023; v1 submitted 14 September, 2021;
originally announced September 2021.
-
A Neurorobotics Approach to Behaviour Selection based on Human Activity Recognition
Authors:
Caetano M. Ranieri,
Renan C. Moioli,
Patricia A. Vargas,
Roseli A. F. Romero
Abstract:
Behaviour selection has been an active research topic for robotics, in particular in the field of human-robot interaction. For a robot to interact effectively and autonomously with humans, the coupling between techniques for human activity recognition, based on sensing information, and robot behaviour selection, based on decision-making mechanisms, is of paramount importance. However, most approac…
▽ More
Behaviour selection has been an active research topic for robotics, in particular in the field of human-robot interaction. For a robot to interact effectively and autonomously with humans, the coupling between techniques for human activity recognition, based on sensing information, and robot behaviour selection, based on decision-making mechanisms, is of paramount importance. However, most approaches to date consist of deterministic associations between the recognised activities and the robot behaviours, neglecting the uncertainty inherent to sequential predictions in real-time applications. In this paper, we address this gap by presenting a neurorobotics approach based on computational models that resemble neurophysiological aspects of living beings. This neurorobotics approach was compared to a non-bioinspired, heuristics-based approach. To evaluate both approaches, a robot simulation is developed, in which a mobile robot has to accomplish tasks according to the activity being performed by the inhabitant of an intelligent home. The outcomes of each approach were evaluated according to the number of correct outcomes provided by the robot. Results revealed that the neurorobotics approach is advantageous, especially considering the computational models based on more complex animals.
△ Less
Submitted 27 September, 2022; v1 submitted 26 July, 2021;
originally announced July 2021.
-
A Data-Driven Biophysical Computational Model of Parkinson's Disease based on Marmoset Monkeys
Authors:
Caetano M. Ranieri,
Jhielson M. Pimentel,
Marcelo R. Romano,
Leonardo A. Elias,
Roseli A. F. Romero,
Michael A. Lones,
Mariana F. P. Araujo,
Patricia A. Vargas,
Renan C. Moioli
Abstract:
In this work we propose a new biophysical computational model of brain regions relevant to Parkinson's Disease based on local field potential data collected from the brain of marmoset monkeys. Parkinson's disease is a neurodegenerative disorder, linked to the death of dopaminergic neurons at the substantia nigra pars compacta, which affects the normal dynamics of the basal ganglia-thalamus-cortex…
▽ More
In this work we propose a new biophysical computational model of brain regions relevant to Parkinson's Disease based on local field potential data collected from the brain of marmoset monkeys. Parkinson's disease is a neurodegenerative disorder, linked to the death of dopaminergic neurons at the substantia nigra pars compacta, which affects the normal dynamics of the basal ganglia-thalamus-cortex neuronal circuit of the brain. Although there are multiple mechanisms underlying the disease, a complete description of those mechanisms and molecular pathogenesis are still missing, and there is still no cure. To address this gap, computational models that resemble neurobiological aspects found in animal models have been proposed. In our model, we performed a data-driven approach in which a set of biologically constrained parameters is optimised using differential evolution. Evolved models successfully resembled single-neuron mean firing rates and spectral signatures of local field potentials from healthy and parkinsonian marmoset brain data. As far as we are concerned, this is the first computational model of Parkinson's Disease based on simultaneous electrophysiological recordings from seven brain regions of Marmoset monkeys. Results show that the proposed model could facilitate the investigation of the mechanisms of PD and support the development of techniques that can indicate new therapies. It could also be applied to other computational neuroscience problems in which biological data could be used to fit multi-scale models of brain circuits.
△ Less
Submitted 1 September, 2021; v1 submitted 26 July, 2021;
originally announced July 2021.
-
Benchmarking AutoML Frameworks for Disease Prediction Using Medical Claims
Authors:
Roland Albert A. Romero,
Mariefel Nicole Y. Deypalan,
Suchit Mehrotra,
John Titus Jungao,
Natalie E. Sheils,
Elisabetta Manduchi,
Jason H. Moore
Abstract:
We ascertain and compare the performances of AutoML tools on large, highly imbalanced healthcare datasets.
We generated a large dataset using historical administrative claims including demographic information and flags for disease codes in four different time windows prior to 2019. We then trained three AutoML tools on this dataset to predict six different disease outcomes in 2019 and evaluated…
▽ More
We ascertain and compare the performances of AutoML tools on large, highly imbalanced healthcare datasets.
We generated a large dataset using historical administrative claims including demographic information and flags for disease codes in four different time windows prior to 2019. We then trained three AutoML tools on this dataset to predict six different disease outcomes in 2019 and evaluated model performances on several metrics.
The AutoML tools showed improvement from the baseline random forest model but did not differ significantly from each other. All models recorded low area under the precision-recall curve and failed to predict true positives while kee** the true negative rate high. Model performance was not directly related to prevalence. We provide a specific use-case to illustrate how to select a threshold that gives the best balance between true and false positive rates, as this is an important consideration in medical applications.
Healthcare datasets present several challenges for AutoML tools, including large sample size, high imbalance, and limitations in the available features types. Improvements in scalability, combinations of imbalance-learning resampling and ensemble approaches, and curated feature selection are possible next steps to achieve better performance.
Among the three explored, no AutoML tool consistently outperforms the rest in terms of predictive performance. The performances of the models in this study suggest that there may be room for improvement in handling medical claims data. Finally, selection of the optimal prediction threshold should be guided by the specific practical application.
△ Less
Submitted 22 July, 2021;
originally announced July 2021.
-
A Note on Confluence in Typed Probabilistic Lambda Calculi
Authors:
Rafael Romero,
Alejandro Díaz-Caro
Abstract:
On the topic of probabilistic rewriting, there are several works studying both termination and confluence of different systems. While working with a lambda calculus modelling quantum computation, we found a system with probabilistic rewriting rules and strongly normalizing terms. We examine the effect of small modifications in probabilistic rewriting, affine variables, and strategies on the overal…
▽ More
On the topic of probabilistic rewriting, there are several works studying both termination and confluence of different systems. While working with a lambda calculus modelling quantum computation, we found a system with probabilistic rewriting rules and strongly normalizing terms. We examine the effect of small modifications in probabilistic rewriting, affine variables, and strategies on the overall confluence in this strongly normalizing probabilistic calculus.
△ Less
Submitted 8 April, 2022; v1 submitted 11 June, 2021;
originally announced June 2021.
-
Place Recognition in Forests with Urquhart Tessellations
Authors:
Guilherme V. Nardari,
Avraham Cohen,
Steven W. Chen,
Xu Liu,
Vaibhav Arcot,
Roseli A. F. Romero,
Vijay Kumar
Abstract:
In this letter, we present a novel descriptor based on Urquhart tessellations derived from the position of trees in a forest. We propose a framework that uses these descriptors to detect previously seen observations and landmark correspondences, even with partial overlap and noise. We run loop closure detection experiments in simulation and real-world data map-merging from different flights of an…
▽ More
In this letter, we present a novel descriptor based on Urquhart tessellations derived from the position of trees in a forest. We propose a framework that uses these descriptors to detect previously seen observations and landmark correspondences, even with partial overlap and noise. We run loop closure detection experiments in simulation and real-world data map-merging from different flights of an Unmanned Aerial Vehicle (UAV) in a pine tree forest and show that our method outperforms state-of-the-art approaches in accuracy and robustness.
△ Less
Submitted 16 November, 2020; v1 submitted 23 September, 2020;
originally announced October 2020.
-
DEEPMIR: A DEEP neural network for differential detection of cerebral Microbleeds and IRon deposits in MRI
Authors:
Tanweer Rashid,
Ahmed Abdulkadir,
Ilya M. Nasrallah,
Jeffrey B. Ware,
Hangfan Liu,
Pascal Spincemaille,
J. Rafael Romero,
R. Nick Bryan,
Susan R. Heckbert,
Mohamad Habes
Abstract:
Lobar cerebral microbleeds (CMBs) and localized non-hemorrhage iron deposits in the basal ganglia have been associated with brain aging, vascular disease and neurodegenerative disorders. Particularly, CMBs are small lesions and require multiple neuroimaging modalities for accurate detection. Quantitative susceptibility map** (QSM) derived from in vivo magnetic resonance imaging (MRI) is necessar…
▽ More
Lobar cerebral microbleeds (CMBs) and localized non-hemorrhage iron deposits in the basal ganglia have been associated with brain aging, vascular disease and neurodegenerative disorders. Particularly, CMBs are small lesions and require multiple neuroimaging modalities for accurate detection. Quantitative susceptibility map** (QSM) derived from in vivo magnetic resonance imaging (MRI) is necessary to differentiate between iron content and mineralization. We set out to develop a deep learning-based segmentation method suitable for segmenting both CMBs and iron deposits. We included a convenience sample of 24 participants from the MESA cohort and used T2-weighted images, susceptibility weighted imaging (SWI), and QSM to segment the two types of lesions. We developed a protocol for simultaneous manual annotation of CMBs and non-hemorrhage iron deposits in the basal ganglia. This manual annotation was then used to train a deep convolution neural network (CNN). Specifically, we adapted the U-Net model with a higher number of resolution layers to be able to detect small lesions such as CMBs from standard resolution MRI. We tested different combinations of the three modalities to determine the most informative data sources for the detection tasks. In the detection of CMBs using single class and multiclass models, we achieved an average sensitivity and precision of between 0.84-0.88 and 0.40-0.59, respectively. The same framework detected non-hemorrhage iron deposits with an average sensitivity and precision of about 0.75-0.81 and 0.62-0.75, respectively. Our results showed that deep learning could automate the detection of small vessel disease lesions and including multimodal MR data (particularly QSM) can improve the detection of CMB and non-hemorrhage iron deposits with sensitivity and precision that is compatible with use in large-scale research studies.
△ Less
Submitted 7 June, 2021; v1 submitted 30 September, 2020;
originally announced October 2020.
-
SLOAM: Semantic Lidar Odometry and Map** for Forest Inventory
Authors:
Steven W. Chen,
Guilherme V. Nardari,
Elijah S. Lee,
Chao Qu,
Xu Liu,
Roseli A. F. Romero,
Vijay Kumar
Abstract:
This paper describes an end-to-end pipeline for tree diameter estimation based on semantic segmentation and lidar odometry and map**. Accurate map** of this type of environment is challenging since the ground and the trees are surrounded by leaves, thorns and vines, and the sensor typically experiences extreme motion. We propose a semantic feature based pose optimization that simultaneously re…
▽ More
This paper describes an end-to-end pipeline for tree diameter estimation based on semantic segmentation and lidar odometry and map**. Accurate map** of this type of environment is challenging since the ground and the trees are surrounded by leaves, thorns and vines, and the sensor typically experiences extreme motion. We propose a semantic feature based pose optimization that simultaneously refines the tree models while estimating the robot pose. The pipeline utilizes a custom virtual reality tool for labeling 3D scans that is used to train a semantic segmentation network. The masked point cloud is used to compute a trellis graph that identifies individual instances and extracts relevant features that are used by the SLAM module. We show that traditional lidar and image based methods fail in the forest environment on both Unmanned Aerial Vehicle (UAV) and hand-carry systems, while our method is more robust, scalable, and automatically generates tree diameter estimations.
△ Less
Submitted 29 December, 2019;
originally announced December 2019.