Search | arXiv e-print repository

Optimal synthesis embeddings

Authors: Roberto Santana, Mauricio Romero Sicre

Abstract: In this paper we introduce a word embedding composition method based on the intuitive idea that a fair embedding representation for a given set of words should satisfy that the new vector will be at the same distance of the vector representation of each of its constituents, and this distance should be minimized. The embedding composition method can work with static and contextualized word represen… ▽ More In this paper we introduce a word embedding composition method based on the intuitive idea that a fair embedding representation for a given set of words should satisfy that the new vector will be at the same distance of the vector representation of each of its constituents, and this distance should be minimized. The embedding composition method can work with static and contextualized word representations, it can be applied to create representations of sentences and learn also representations of sets of words that are not necessarily organized as a sequence. We theoretically characterize the conditions for the existence of this type of representation and derive the solution. We evaluate the method in data augmentation and sentence classification tasks, investigating several design choices of embeddings and composition methods. We show that our approach excels in solving probing tasks designed to capture simple linguistic features of sentences. △ Less

Submitted 10 June, 2024; originally announced June 2024.

arXiv:2404.15118 [pdf, other]

Identifying phase transitions in physical systems with neural networks: a neural architecture search perspective

Authors: Rodrigo Carmo Terin, Zochil González Arenas, Roberto Santana

Abstract: The use of machine learning algorithms to investigate phase transitions in physical systems is a valuable way to better understand the characteristics of these systems. Neural networks have been used to extract information of phases and phase transitions directly from many-body configurations. However, one limitation of neural networks is that they require the definition of the model architecture… ▽ More The use of machine learning algorithms to investigate phase transitions in physical systems is a valuable way to better understand the characteristics of these systems. Neural networks have been used to extract information of phases and phase transitions directly from many-body configurations. However, one limitation of neural networks is that they require the definition of the model architecture and parameters previous to their application, and such determination is itself a difficult problem. In this paper, we investigate for the first time the relationship between the accuracy of neural networks for information of phases and the network configuration (that comprises the architecture and hyperparameters). We formulate the phase analysis as a regression task, address the question of generating data that reflects the different states of the physical system, and evaluate the performance of neural architecture search for this task. After obtaining the optimized architectures, we further implement smart data processing and analytics by means of neuron coverage metrics, assessing the capability of these metrics to estimate phase transitions. Our results identify the neuron coverage metric as promising for detecting phase transitions in physical systems. △ Less

Submitted 23 April, 2024; originally announced April 2024.

Comments: 9 pages, 7 figures

arXiv:2403.13740 [pdf, other]

Uncertainty-Aware Explanations Through Probabilistic Self-Explainable Neural Networks

Authors: Jon Vadillo, Roberto Santana, Jose A. Lozano, Marta Kwiatkowska

Abstract: The lack of transparency of Deep Neural Networks continues to be a limitation that severely undermines their reliability and usage in high-stakes applications. Promising approaches to overcome such limitations are Prototype-Based Self-Explainable Neural Networks (PSENNs), whose predictions rely on the similarity between the input at hand and a set of prototypical representations of the output clas… ▽ More The lack of transparency of Deep Neural Networks continues to be a limitation that severely undermines their reliability and usage in high-stakes applications. Promising approaches to overcome such limitations are Prototype-Based Self-Explainable Neural Networks (PSENNs), whose predictions rely on the similarity between the input at hand and a set of prototypical representations of the output classes, offering therefore a deep, yet transparent-by-design, architecture. So far, such models have been designed by considering pointwise estimates for the prototypes, which remain fixed after the learning phase of the model. In this paper, we introduce a probabilistic reformulation of PSENNs, called Prob-PSENN, which replaces point estimates for the prototypes with probability distributions over their values. This provides not only a more flexible framework for an end-to-end learning of prototypes, but can also capture the explanatory uncertainty of the model, which is a missing feature in previous approaches. In addition, since the prototypes determine both the explanation and the prediction, Prob-PSENNs allow us to detect when the model is making uninformed or uncertain predictions, and to obtain valid explanations for them. Our experiments demonstrate that Prob-PSENNs provide more meaningful and robust explanations than their non-probabilistic counterparts, thus enhancing the explainability and reliability of the models. △ Less

Submitted 20 March, 2024; originally announced March 2024.

arXiv:2403.09249 [pdf, other]

Leveraging Constraint Programming in a Deep Learning Approach for Dynamically Solving the Flexible Job-Shop Scheduling Problem

Authors: Imanol Echeverria, Maialen Murua, Roberto Santana

Abstract: Recent advancements in the flexible job-shop scheduling problem (FJSSP) are primarily based on deep reinforcement learning (DRL) due to its ability to generate high-quality, real-time solutions. However, DRL approaches often fail to fully harness the strengths of existing techniques such as exact methods or constraint programming (CP), which can excel at finding optimal or near-optimal solutions f… ▽ More Recent advancements in the flexible job-shop scheduling problem (FJSSP) are primarily based on deep reinforcement learning (DRL) due to its ability to generate high-quality, real-time solutions. However, DRL approaches often fail to fully harness the strengths of existing techniques such as exact methods or constraint programming (CP), which can excel at finding optimal or near-optimal solutions for smaller instances. This paper aims to integrate CP within a deep learning (DL) based methodology, leveraging the benefits of both. In this paper, we introduce a method that involves training a DL model using optimal solutions generated by CP, ensuring the model learns from high-quality data, thereby eliminating the need for the extensive exploration typical in DRL and enhancing overall performance. Further, we integrate CP into our DL framework to jointly construct solutions, utilizing DL for the initial complex stages and transitioning to CP for optimal resolution as the problem is simplified. Our hybrid approach has been extensively tested on three public FJSSP benchmarks, demonstrating superior performance over five state-of-the-art DRL approaches and a widely-used CP solver. Additionally, with the objective of exploring the application to other combinatorial optimization problems, promising preliminary results are presented on applying our hybrid approach to the traveling salesman problem, combining an exact method with a well-known DRL method. △ Less

Submitted 14 March, 2024; originally announced March 2024.

arXiv:2403.03648 [pdf, other]

doi 10.3390/s24051695

A Connector for Integrating NGSI-LD Data into Open Data Portals

Authors: Laura Martín, Jorge Lanza, Víctor González, Juan Ramón Santana, Pablo Sotres, Luis Sánchez

Abstract: Nowadays, there are plenty of data sources generating massive amounts of information that, combined with novel data analytics frameworks, are meant to support optimisation in many application domains. Nonetheless, there are still shortcomings in terms of data discoverability, accessibility and interoperability. Open Data portals have emerged as a shift towards openness and discoverability. However… ▽ More Nowadays, there are plenty of data sources generating massive amounts of information that, combined with novel data analytics frameworks, are meant to support optimisation in many application domains. Nonetheless, there are still shortcomings in terms of data discoverability, accessibility and interoperability. Open Data portals have emerged as a shift towards openness and discoverability. However, they do not impose any condition to the data itself, just stipulate how datasets have to be described. Alternatively, the NGSI-LD standard pursues harmonisation in terms of data modelling and accessibility. This paper presents a solution that bridges these two domains (i.e., Open Data portals and NGSI-LD-based data) in order to keep benefiting from the structured description of datasets offered by Open Data portals, while ensuring the interoperability provided by the NGSI-LD standard. Our solution aggregates the data into coherent datasets and generate high-quality descriptions, ensuring comprehensiveness, interoperability and accessibility. The proposed solution has been validated through a real-world implementation that exposes IoT data in NGSI-LD format through the European Data Portal (EDP). Moreover, the results from the Metadata Quality Assessment that the EDP implements, show that the datasets' descriptions generated achieve excellent ranking in terms of the Findability, Accessibility, Interoperability and Reusability (FAIR) data principles. △ Less

Submitted 6 March, 2024; originally announced March 2024.

Comments: This work belongs to the Special Issue Data Engineering in the Internet of Things of MDPI Sensors. This work has been partially supported by the project SALTED from the European Union's Connecting Europe Facility program under Action Number 2020-EU-IA-0274, and by the project SITED under Grant Agreement No. PID2021-125725OB-I00 funded by MCIN/AEI/10.13039/501100011033 and the European Union FEDER

Journal ref: Sensors 2024, 24, 1695

arXiv:2403.03196 [pdf]

doi 10.1016/j.bjp.2013.12.020

SmartSantander: IoT Experimentation over a Smart City Testbed

Authors: Luis Sanchez, Luis Muñoz, Jose Antonio Galache, Pablo Sotres, Juan R. Santana, Veronica Gutierrez, Rajiv Ramdhany, Alex Gluhak, Srdjan Krco, Evangelos Theodoridis, Dennis Pfisterer

Abstract: This paper describes the deployment and experimentation architecture of the Internet of Things experimentation facility being deployed at Santander city. The facility is implemented within the SmartSantander project, one of the projects of the Future Internet Research and Experimentation initiative of the European Commission and represents a unique in the world city-scale experimental research fac… ▽ More This paper describes the deployment and experimentation architecture of the Internet of Things experimentation facility being deployed at Santander city. The facility is implemented within the SmartSantander project, one of the projects of the Future Internet Research and Experimentation initiative of the European Commission and represents a unique in the world city-scale experimental research facility. Additionally, this facility supports typical applications and services of a smart city. Tangible results are expected to influence the definition and specification of Future Internet architecture design from viewpoints of Internet of Things and Internet of Services. The facility comprises a large number of Internet of Things devices deployed in several urban scenarios which will be federated into a single testbed. In this paper the deployment being carried out at the main location, namely Santander city, is described. Besides presenting the current deployment, in this article the main insights in terms of the architectural design of a large-scale IoT testbed are presented as well. Furthermore, solutions adopted for implementation of the different components addressing the required testbed functionalities are also sketched out. The IoT experimentation facility described in this paper is conceived to provide a suitable platform for large scale experimentation and evaluation of IoT concepts under real-life conditions. △ Less

Submitted 5 March, 2024; originally announced March 2024.

Comments: This work is published in Elsevier Computer Networks. This work has been funded by research project SmartSantander, under FP7-ICT-2009-5 of the 7th Framework Programme of the European Community

Journal ref: Computer Networks, Volume 61, 14 March 2014, Pages 217-238

arXiv:2310.15706 [pdf, other]

Solving the flexible job-shop scheduling problem through an enhanced deep reinforcement learning approach

Authors: Imanol Echeverria, Maialen Murua, Roberto Santana

Abstract: In scheduling problems common in the industry and various real-world scenarios, responding in real-time to disruptive events is essential. Recent methods propose the use of deep reinforcement learning (DRL) to learn policies capable of generating solutions under this constraint. The objective of this paper is to introduce a new DRL method for solving the flexible job-shop scheduling problem, parti… ▽ More In scheduling problems common in the industry and various real-world scenarios, responding in real-time to disruptive events is essential. Recent methods propose the use of deep reinforcement learning (DRL) to learn policies capable of generating solutions under this constraint. The objective of this paper is to introduce a new DRL method for solving the flexible job-shop scheduling problem, particularly for large instances. The approach is based on the use of heterogeneous graph neural networks to a more informative graph representation of the problem. This novel modeling of the problem enhances the policy's ability to capture state information and improve its decision-making capacity. Additionally, we introduce two novel approaches to enhance the performance of the DRL approach: the first involves generating a diverse set of scheduling policies, while the second combines DRL with dispatching rules (DRs) constraining the action space. Experimental results on two public benchmarks show that our approach outperforms DRs and achieves superior results compared to three state-of-the-art DRL methods, particularly for large instances. △ Less

Submitted 30 January, 2024; v1 submitted 24 October, 2023; originally announced October 2023.

arXiv:2306.09628 [pdf, other]

Structural Restricted Boltzmann Machine for image denoising and classification

Authors: Arkaitz Bidaurrazaga, Aritz Pérez, Roberto Santana

Abstract: Restricted Boltzmann Machines are generative models that consist of a layer of hidden variables connected to another layer of visible units, and they are used to model the distribution over visible variables. In order to gain a higher representability power, many hidden units are commonly used, which, in combination with a large number of visible units, leads to a high number of trainable paramete… ▽ More Restricted Boltzmann Machines are generative models that consist of a layer of hidden variables connected to another layer of visible units, and they are used to model the distribution over visible variables. In order to gain a higher representability power, many hidden units are commonly used, which, in combination with a large number of visible units, leads to a high number of trainable parameters. In this work we introduce the Structural Restricted Boltzmann Machine model, which taking advantage of the structure of the data in hand, constrains connections of hidden units to subsets of visible units in order to reduce significantly the number of trainable parameters, without compromising performance. As a possible area of application, we focus on image modelling. Based on the nature of the images, the structure of the connections is given in terms of spatial neighbourhoods over the pixels of the image that constitute the visible variables of the model. We conduct extensive experiments on various image domains. Image denoising is evaluated with corrupted images from the MNIST dataset. The generative power of our models is compared to vanilla RBMs, as well as their classification performance, which is assessed with five different image domains. Results show that our proposed model has a faster and more stable training, while also obtaining better results compared to an RBM with no constrained connections between its visible and hidden units. △ Less

Submitted 16 June, 2023; originally announced June 2023.

arXiv:2305.03431 [pdf, other]

Hearing the voice of experts: Unveiling Stack Exchange communities' knowledge of test smells

Authors: Luana Martins, Denivan Campos, Railana Santana, Joselito Mota Junior, Heitor Costa, Ivan Machado

Abstract: Refactorings are transformations to improve the code design without changing overall functionality and observable behavior. During the refactoring process of smelly test code, practitioners may struggle to identify refactoring candidates and define and apply corrective strategies. This paper reports on an empirical study aimed at understanding how test smells and test refactorings are discussed on… ▽ More Refactorings are transformations to improve the code design without changing overall functionality and observable behavior. During the refactoring process of smelly test code, practitioners may struggle to identify refactoring candidates and define and apply corrective strategies. This paper reports on an empirical study aimed at understanding how test smells and test refactorings are discussed on the Stack Exchange network. Developers commonly count on Stack Exchange to pick the brains of the wise, i.e., to `look up' how others are completing similar tasks. Therefore, in light of data from the Stack Exchange discussion topics, we could examine how developers understand and perceive test smells, the corrective actions they take to handle them, and the challenges they face when refactoring test code aiming to fix test smells. We observed that developers are interested in others' perceptions and hands-on experience handling test code issues. Besides, there is a clear indication that developers often ask whether test smells or anti-patterns are either good or bad testing practices than code-based refactoring recommendations. △ Less

Submitted 5 May, 2023; originally announced May 2023.

Comments: Preprint of the manuscript accepted for publication at CHASE 2023

arXiv:2303.02801 [pdf, ps, other]

Neuroevolutionary algorithms driven by neuron coverage metrics for semi-supervised classification

Authors: Roberto Santana, Ivan Hidalgo-Cenalmor, Unai Garciarena, Alexander Mendiburu, Jose Antonio Lozano

Abstract: In some machine learning applications the availability of labeled instances for supervised classification is limited while unlabeled instances are abundant. Semi-supervised learning algorithms deal with these scenarios and attempt to exploit the information contained in the unlabeled examples. In this paper, we address the question of how to evolve neural networks for semi-supervised problems. We… ▽ More In some machine learning applications the availability of labeled instances for supervised classification is limited while unlabeled instances are abundant. Semi-supervised learning algorithms deal with these scenarios and attempt to exploit the information contained in the unlabeled examples. In this paper, we address the question of how to evolve neural networks for semi-supervised problems. We introduce neuroevolutionary approaches that exploit unlabeled instances by using neuron coverage metrics computed on the neural network architecture encoded by each candidate solution. Neuron coverage metrics resemble code coverage metrics used to test software, but are oriented to quantify how the different neural network components are covered by test instances. In our neuroevolutionary approach, we define fitness functions that combine classification accuracy computed on labeled examples and neuron coverage metrics evaluated using unlabeled examples. We assess the impact of these functions on semi-supervised problems with a varying amount of labeled instances. Our results show that the use of neuron coverage metrics helps neuroevolution to become less sensitive to the scarcity of labeled data, and can lead in some cases to a more robust generalization of the learned classifiers. △ Less

Submitted 5 March, 2023; originally announced March 2023.

arXiv:2302.12565 [pdf, other]

Variational Linearized Laplace Approximation for Bayesian Deep Learning

Authors: Luis A. Ortega, Simón Rodríguez Santana, Daniel Hernández-Lobato

Abstract: The Linearized Laplace Approximation (LLA) has been recently used to perform uncertainty estimation on the predictions of pre-trained deep neural networks (DNNs). However, its widespread application is hindered by significant computational costs, particularly in scenarios with a large number of training points or DNN parameters. Consequently, additional approximations of LLA, such as Kronecker-fac… ▽ More The Linearized Laplace Approximation (LLA) has been recently used to perform uncertainty estimation on the predictions of pre-trained deep neural networks (DNNs). However, its widespread application is hindered by significant computational costs, particularly in scenarios with a large number of training points or DNN parameters. Consequently, additional approximations of LLA, such as Kronecker-factored or diagonal approximate GGN matrices, are utilized, potentially compromising the model's performance. To address these challenges, we propose a new method for approximating LLA using a variational sparse Gaussian Process (GP). Our method is based on the dual RKHS formulation of GPs and retains, as the predictive mean, the output of the original DNN. Furthermore, it allows for efficient stochastic optimization, which results in sub-linear training time in the size of the training dataset. Specifically, its training cost is independent of the number of training points. We compare our proposed method against accelerated LLA (ELLA), which relies on the Nyström approximation, as well as other LLA variants employing the sample-then-optimize principle. Experimental results, both on regression and classification datasets, show that our method outperforms these already existing efficient variants of LLA, both in terms of the quality of the predictive distribution and in terms of total computational time. △ Less

Submitted 22 May, 2024; v1 submitted 24 February, 2023; originally announced February 2023.

Comments: 22 pages, 8 figures, ICML 2024

Journal ref: PMLR 235 (2024)

arXiv:2302.07557 [pdf, other]

On the Generalization of PINNs outside the training domain and the Hyperparameters influencing it

Authors: Andrea Bonfanti, Roberto Santana, Marco Ellero, Babak Gholami

Abstract: Physics-Informed Neural Networks (PINNs) are Neural Network architectures trained to emulate solutions of differential equations without the necessity of solution data. They are currently ubiquitous in the scientific literature due to their flexible and promising settings. However, very little of the available research provides practical studies that aim for a better quantitative understanding of… ▽ More Physics-Informed Neural Networks (PINNs) are Neural Network architectures trained to emulate solutions of differential equations without the necessity of solution data. They are currently ubiquitous in the scientific literature due to their flexible and promising settings. However, very little of the available research provides practical studies that aim for a better quantitative understanding of such architecture and its functioning. In this paper, we perform an empirical analysis of the behavior of PINN predictions outside their training domain. The primary goal is to investigate the scenarios in which a PINN can provide consistent predictions outside the training area. Thereinafter, we assess whether the algorithmic setup of PINNs can influence their potential for generalization and showcase the respective effect on the prediction. The results obtained in this study returns insightful and at times counterintuitive perspectives which can be highly relevant for architectures which combines PINNs with domain decomposition and/or adaptive training strategies. △ Less

Submitted 24 August, 2023; v1 submitted 15 February, 2023; originally announced February 2023.

arXiv:2207.10673 [pdf, other]

Correcting Model Bias with Sparse Implicit Processes

Authors: Simón Rodríguez Santana, Luis A. Ortega, Daniel Hernández-Lobato, Bryan Zaldívar

Abstract: Model selection in machine learning (ML) is a crucial part of the Bayesian learning procedure. Model choice may impose strong biases on the resulting predictions, which can hinder the performance of methods such as Bayesian neural networks and neural samplers. On the other hand, newly proposed approaches for Bayesian ML exploit features of approximate inference in function space with implicit stoc… ▽ More Model selection in machine learning (ML) is a crucial part of the Bayesian learning procedure. Model choice may impose strong biases on the resulting predictions, which can hinder the performance of methods such as Bayesian neural networks and neural samplers. On the other hand, newly proposed approaches for Bayesian ML exploit features of approximate inference in function space with implicit stochastic processes (a generalization of Gaussian processes). The approach of Sparse Implicit Processes (SIP) is particularly successful in this regard, since it is fully trainable and achieves flexible predictions. Here, we expand on the original experiments to show that SIP is capable of correcting model bias when the data generating mechanism differs strongly from the one implied by the model. We use synthetic datasets to show that SIP is capable of providing predictive distributions that reflect the data better than the exact predictions of the initial, but wrongly assumed model. △ Less

Submitted 8 August, 2022; v1 submitted 21 July, 2022; originally announced July 2022.

Comments: 4 pages, 1 double figure. Included in ICML 2022 workshop "Beyond Bayes: Paths Towards Universal Reasoning Systems". Extension of previous work on Sparse Implicit Processes (arXiv:2110.07618)

arXiv:2207.05539 [pdf, other]

Refactoring Assertion Roulette and Duplicate Assert test smells: a controlled experiment

Authors: Railana Santana, Luana Martins, Tássio Virgínio, Larissa Soares, Heitor Costa, Ivan Machado

Abstract: Test smells can reduce the developers' ability to interact with the test code. Refactoring test code offers a safe strategy to handle test smells. However, the manual refactoring activity is not a trivial process, and it is often tedious and error-prone. This study aims to evaluate RAIDE, a tool for automatic identification and refactoring of test smells. We present an empirical assessment of RAID… ▽ More Test smells can reduce the developers' ability to interact with the test code. Refactoring test code offers a safe strategy to handle test smells. However, the manual refactoring activity is not a trivial process, and it is often tedious and error-prone. This study aims to evaluate RAIDE, a tool for automatic identification and refactoring of test smells. We present an empirical assessment of RAIDE, in which we analyzed its capability at refactoring Assertion Roulette and Duplicate Assert test smells and compared the results against both manual refactoring and a state-of-the-art approach. The results show that RAIDE provides a faster and more intuitive approach for handling test smells than using an automated tool for smells detection combined with manual refactoring. △ Less

Submitted 12 July, 2022; originally announced July 2022.

Journal ref: XXV Ibero-American Conference on Software Engineering (CIbSE 2022)

arXiv:2206.10160 [pdf, other]

Predicting Parking Lot Availability by Graph-to-Sequence Model: A Case Study with SmartSantander

Authors: Yuya Sasaki, Junya Takayama, Juan Ramón Santana, Shohei Yamasaki, Tomoya Okuno, Makoto Onizuka

Abstract: Nowadays, so as to improve services and urban areas livability, multiple smart city initiatives are being carried out throughout the world. SmartSantander is a smart city project in Santander, Spain, which has relied on wireless sensor network technologies to deploy heterogeneous sensors within the city to measure multiple parameters, including outdoor parking information. In this paper, we study… ▽ More Nowadays, so as to improve services and urban areas livability, multiple smart city initiatives are being carried out throughout the world. SmartSantander is a smart city project in Santander, Spain, which has relied on wireless sensor network technologies to deploy heterogeneous sensors within the city to measure multiple parameters, including outdoor parking information. In this paper, we study the prediction of parking lot availability using historical data from more than 300 outdoor parking sensors with SmartSantander. We design a graph-to-sequence model to capture the periodical fluctuation and geographical proximity of parking lots. For develo** and evaluating our model, we use a 3-year dataset of parking lot availability in the city of Santander. Our model achieves a high accuracy compared with existing sequence-to-sequence models, which is accurate enough to provide a parking information service in the city. We apply our model to a smartphone application to be widely used by citizens and tourists. △ Less

Submitted 21 June, 2022; originally announced June 2022.

arXiv:2206.06720 [pdf, other]

Deep Variational Implicit Processes

Authors: Luis A. Ortega, Simón Rodríguez Santana, Daniel Hernández-Lobato

Abstract: Implicit processes (IPs) are a generalization of Gaussian processes (GPs). IPs may lack a closed-form expression but are easy to sample from. Examples include, among others, Bayesian neural networks or neural samplers. IPs can be used as priors over functions, resulting in flexible models with well-calibrated prediction uncertainty estimates. Methods based on IPs usually carry out function-space a… ▽ More Implicit processes (IPs) are a generalization of Gaussian processes (GPs). IPs may lack a closed-form expression but are easy to sample from. Examples include, among others, Bayesian neural networks or neural samplers. IPs can be used as priors over functions, resulting in flexible models with well-calibrated prediction uncertainty estimates. Methods based on IPs usually carry out function-space approximate inference, which overcomes some of the difficulties of parameter-space approximate inference. Nevertheless, the approximations employed often limit the expressiveness of the final model, resulting, e.g., in a Gaussian predictive distribution, which can be restrictive. We propose here a multi-layer generalization of IPs called the Deep Variational Implicit process (DVIP). This generalization is similar to that of deep GPs over GPs, but it is more flexible due to the use of IPs as the prior distribution over the latent functions. We describe a scalable variational inference algorithm for training DVIP and show that it outperforms previous IP-based methods and also deep GPs. We support these claims via extensive regression and classification experiments. We also evaluate DVIP on large datasets with up to several million data instances to illustrate its good scalability and performance. △ Less

Submitted 16 February, 2023; v1 submitted 14 June, 2022; originally announced June 2022.

Comments: 19 pages, 6 figures, ICLR 2023

arXiv:2204.01468 [pdf]

Criação e aplicação de ferramenta para auxiliar no ensino de algoritmos e programação de computadores

Authors: Afonso Henriques Fontes Neto Segundo, Joel Sotero da Cunha Neto, Maria Daniela Santabaia Cavalcanti, Paulo Cirillo Souza Barbosa, Raul Fontenele Santana

Abstract: Knowledge about programming is part of the knowledge matrix that will be required of the professionals of the future. Based on this, this work aims to report the development of a teaching tool developed during the monitoring program of the Algorithm and Computer Programming discipline of the University of Fortaleza. The tool combines the knowledge acquired in the books, with a language closer to t… ▽ More Knowledge about programming is part of the knowledge matrix that will be required of the professionals of the future. Based on this, this work aims to report the development of a teaching tool developed during the monitoring program of the Algorithm and Computer Programming discipline of the University of Fortaleza. The tool combines the knowledge acquired in the books, with a language closer to the students, using video lessons and exercises proposed, with all the content available on the internet. The preliminary results were positive, with the students approving this new approach and believing that it could contribute to a better performance in the discipline. △ Less

Submitted 31 March, 2022; originally announced April 2022.

Comments: in Portuguese language

arXiv:2203.16927 [pdf]

Applying PBL in the Development and Modeling of kinematics for Robotic Manipulators with Interdisciplinarity between Computer-Assisted Project, Robotics, and Microcontrollers

Authors: Afonso Henriques Fontes Neto Segundo, Joel Sotero da Cunha Neto, Paulo Cirillo Souza Barbosa, Raul Fontenele Santana

Abstract: Considering the difficulty of students in calculating the direct and inverse kinematics of a robotic manipulator using only conventional tools of a classroom, this article proposes the application of Project Based Learning (ABP) through the design, development, mathematical modeling of a robotic manipulator as an integrative project of the disciplines of Industrial Robotics, Microcontrollers and C… ▽ More Considering the difficulty of students in calculating the direct and inverse kinematics of a robotic manipulator using only conventional tools of a classroom, this article proposes the application of Project Based Learning (ABP) through the design, development, mathematical modeling of a robotic manipulator as an integrative project of the disciplines of Industrial Robotics, Microcontrollers and Computer Assisted Design with students of the Control and Automation Engineering of the University of Fortaleza. Once designed and machined, the manipulator arm was assembled using servo motors connected to a microcontroled prototy** board, to then have its kinematics calculated. At the end are presented the results that the project has brought to the learning of the disciplines on the optics of the tutor and students. △ Less

Submitted 31 March, 2022; originally announced March 2022.

Comments: in Portuguese language

arXiv:2203.16924 [pdf]

Development of a robotic manipulator: Applying interdisciplinarity in Computer Assister Project, Microcontrollers and Industrial Robotics

Authors: Afonso Henriques Fontes Neto Segundo, Joel Sotero da Cunha Neto, Reginaldo Florencio da Silva, Paulo Cirillo Souza Barbosa, Raul Fontenele Santana

Abstract: This work was conceived based on Project-Based Learning (ABP) and presents the design, development and mathematical modeling steps of a low-cost robotic manipulator with five degrees of freedom through an interdisciplinary project linking two very important disciplines of the course of Control Engineering and Automation of the University of Fortaleza: Computer Aided Design, Microcontrollers and In… ▽ More This work was conceived based on Project-Based Learning (ABP) and presents the design, development and mathematical modeling steps of a low-cost robotic manipulator with five degrees of freedom through an interdisciplinary project linking two very important disciplines of the course of Control Engineering and Automation of the University of Fortaleza: Computer Aided Design, Microcontrollers and Industrial Robotics. At the end are presented the results that the project has brought to the best learning of the discipline on the optics of the tutor and students. △ Less

Submitted 31 March, 2022; originally announced March 2022.

Comments: in Portuguese language

arXiv:2111.08165 [pdf, other]

RapidRead: Global Deployment of State-of-the-art Radiology AI for a Large Veterinary Teleradiology Practice

Authors: Michael Fitzke, Conrad Stack, Andre Dourson, Rodrigo M. B. Santana, Diane Wilson, Lisa Ziemer, Arjun Soin, Matthew P. Lungren, Paul Fisher, Mark Parkinson

Abstract: This work describes the development and real-world deployment of a deep learning-based AI system for evaluating canine and feline radiographs across a broad range of findings and abnormalities. We describe a new semi-supervised learning approach that combines NLP-derived labels with self-supervised training leveraging more than 2.5 million x-ray images. Finally we describe the clinical deployment… ▽ More This work describes the development and real-world deployment of a deep learning-based AI system for evaluating canine and feline radiographs across a broad range of findings and abnormalities. We describe a new semi-supervised learning approach that combines NLP-derived labels with self-supervised training leveraging more than 2.5 million x-ray images. Finally we describe the clinical deployment of the model including system architecture, real-time performance evaluation and data drift detection. △ Less

Submitted 9 November, 2021; originally announced November 2021.

arXiv:2110.07618 [pdf, other]

Function-space Inference with Sparse Implicit Processes

Authors: Simón Rodríguez Santana, Bryan Zaldivar, Daniel Hernández-Lobato

Abstract: Implicit Processes (IPs) represent a flexible framework that can be used to describe a wide variety of models, from Bayesian neural networks, neural samplers and data generators to many others. IPs also allow for approximate inference in function-space. This change of formulation solves intrinsic degenerate problems of parameter-space approximate inference concerning the high number of parameters… ▽ More Implicit Processes (IPs) represent a flexible framework that can be used to describe a wide variety of models, from Bayesian neural networks, neural samplers and data generators to many others. IPs also allow for approximate inference in function-space. This change of formulation solves intrinsic degenerate problems of parameter-space approximate inference concerning the high number of parameters and their strong dependencies in large models. For this, previous works in the literature have attempted to employ IPs both to set up the prior and to approximate the resulting posterior. However, this has proven to be a challenging task. Existing methods that can tune the prior IP result in a Gaussian predictive distribution, which fails to capture important data patterns. By contrast, methods producing flexible predictive distributions by using another IP to approximate the posterior process cannot tune the prior IP to the observed data. We propose here the first method that can accomplish both goals. For this, we rely on an inducing-point representation of the prior IP, as often done in the context of sparse Gaussian processes. The result is a scalable method for approximate inference with IPs that can tune the prior IP parameters to the data, and that provides accurate non-Gaussian predictive distributions. △ Less

Submitted 21 July, 2022; v1 submitted 14 October, 2021; originally announced October 2021.

Comments: Published at ICML 2022 (long oral presentation). Code available at https://github.com/simonrsantana/sparse-implicit-processes

arXiv:2107.01943 [pdf, other]

When and How to Fool Explainable Models (and Humans) with Adversarial Examples

Authors: Jon Vadillo, Roberto Santana, Jose A. Lozano

Abstract: Reliable deployment of machine learning models such as neural networks continues to be challenging due to several limitations. Some of the main shortcomings are the lack of interpretability and the lack of robustness against adversarial examples or out-of-distribution inputs. In this exploratory review, we explore the possibilities and limits of adversarial attacks for explainable machine learning… ▽ More Reliable deployment of machine learning models such as neural networks continues to be challenging due to several limitations. Some of the main shortcomings are the lack of interpretability and the lack of robustness against adversarial examples or out-of-distribution inputs. In this exploratory review, we explore the possibilities and limits of adversarial attacks for explainable machine learning models. First, we extend the notion of adversarial examples to fit in explainable machine learning scenarios, in which the inputs, the output classifications and the explanations of the model's decisions are assessed by humans. Next, we propose a comprehensive framework to study whether (and how) adversarial examples can be generated for explainable models under human assessment, introducing and illustrating novel attack paradigms. In particular, our framework considers a wide range of relevant yet often ignored factors such as the type of problem, the user expertise or the objective of the explanations, in order to identify the attack strategies that should be adopted in each scenario to successfully deceive the model (and the human). The intention of these contributions is to serve as a basis for a more rigorous and realistic study of adversarial examples in the field of explainable machine learning. △ Less

Submitted 7 July, 2023; v1 submitted 5 July, 2021; originally announced July 2021.

Comments: Updated version. 43 pages, 9 figures, 4 tables

arXiv:2106.08972 [pdf, other]

Redefining Neural Architecture Search of Heterogeneous Multi-Network Models by Characterizing Variation Operators and Model Components

Authors: Unai Garciarena, Roberto Santana, Alexander Mendiburu

Abstract: With neural architecture search methods gaining ground on manually designed deep neural networks -even more rapidly as model sophistication escalates-, the research trend shifts towards arranging different and often increasingly complex neural architecture search spaces. In this conjuncture, delineating algorithms which can efficiently explore these search spaces can result in a significant improv… ▽ More With neural architecture search methods gaining ground on manually designed deep neural networks -even more rapidly as model sophistication escalates-, the research trend shifts towards arranging different and often increasingly complex neural architecture search spaces. In this conjuncture, delineating algorithms which can efficiently explore these search spaces can result in a significant improvement over currently used methods, which, in general, randomly select the structural variation operator, ho** for a performance gain. In this paper, we investigate the effect of different variation operators in a complex domain, that of multi-network heterogeneous neural models. These models have an extensive and complex search space of structures as they require multiple sub-networks within the general model in order to answer to different output types. From that investigation, we extract a set of general guidelines, whose application is not limited to that particular type of model, and are useful to determine the direction in which an architecture optimization method could find the largest improvement. To deduce the set of guidelines, we characterize both the variation operators, according to their effect on the complexity and performance of the model; and the models, relying on diverse metrics which estimate the quality of the different parts composing it. △ Less

Submitted 17 August, 2022; v1 submitted 16 June, 2021; originally announced June 2021.

MSC Class: 68T07 ACM Class: I.2.6

arXiv:2105.12836 [pdf, other]

On the Exploitation of Neuroevolutionary Information: Analyzing the Past for a More Efficient Future

Authors: Unai Garciarena, Nuno Lourenço, Penousal Machado, Roberto Santana, Alexander Mendiburu

Abstract: Neuroevolutionary algorithms, automatic searches of neural network structures by means of evolutionary techniques, are computationally costly procedures. In spite of this, due to the great performance provided by the architectures which are found, these methods are widely applied. The final outcome of neuroevolutionary processes is the best structure found during the search, and the rest of the pr… ▽ More Neuroevolutionary algorithms, automatic searches of neural network structures by means of evolutionary techniques, are computationally costly procedures. In spite of this, due to the great performance provided by the architectures which are found, these methods are widely applied. The final outcome of neuroevolutionary processes is the best structure found during the search, and the rest of the procedure is commonly omitted in the literature. However, a good amount of residual information consisting of valuable knowledge that can be extracted is also produced during these searches. In this paper, we propose an approach that extracts this information from neuroevolutionary runs, and use it to build a metamodel that could positively impact future neural architecture searches. More specifically, by inspecting the best structures found during neuroevolutionary searches of generative adversarial networks with varying characteristics (e.g., based on dense or convolutional layers), we propose a Bayesian network-based model which can be used to either find strong neural structures right away, conveniently initialize different structural searches for different problems, or help future optimization of structures of any type to keep finding increasingly better structures where uninformed methods get stuck into local optima. △ Less

Submitted 26 May, 2021; originally announced May 2021.

arXiv:2105.01878 [pdf, other]

doi 10.1145/3316782.3322764

The EMPATHIC Project: Mid-term Achievements

Authors: M. I. Torres, J. M. Olaso, C. Montenegro, R. Santana, A. Vázquez, R. Justo, J. A. Lozano, S. Schlögl, G. Chollet, N. Dugan, M. Irvine, N. Glackin, C. Pickard, A. Esposito, G. Cordasco, A. Troncone, D. Petrovska-Delacretaz, A. Mtibaa, M. A. Hmani, M. S. Korsnes, L. J. Martinussen, S. Escalera, C. Palmero Cantariño, O. Deroo, O. Gordeeva , et al. (4 additional authors not shown)

Abstract: The goal of active aging is to promote changes in the elderly community so as to maintain an active, independent and socially-engaged lifestyle. Technological advancements currently provide the necessary tools to foster and monitor such processes. This paper reports on mid-term achievements of the European H2020 EMPATHIC project, which aims to research, innovate, explore and validate new interacti… ▽ More The goal of active aging is to promote changes in the elderly community so as to maintain an active, independent and socially-engaged lifestyle. Technological advancements currently provide the necessary tools to foster and monitor such processes. This paper reports on mid-term achievements of the European H2020 EMPATHIC project, which aims to research, innovate, explore and validate new interaction paradigms and platforms for future generations of personalized virtual coaches to assist the elderly and their carers to reach the active aging goal, in the vicinity of their home. The project focuses on evidence-based, user-validated research and integration of intelligent technology, and context sensing methods through automatic voice, eye and facial analysis, integrated with visual and spoken dialogue system capabilities. In this paper, we describe the current status of the system, with a special emphasis on its components and their integration, the creation of a Wizard of Oz platform, and findings gained from user interaction studies conducted throughout the first 18 months of the project. △ Less

Submitted 5 May, 2021; originally announced May 2021.

Comments: 12 pages

arXiv:2103.06138 [pdf, other]

Hybrid Model with Time Modeling for Sequential Recommender Systems

Authors: Marlesson R. O. Santana, Anderson Soares

Abstract: Deep learning based methods have been used successfully in recommender system problems. Approaches using recurrent neural networks, transformers, and attention mechanisms are useful to model users' long- and short-term preferences in sequential interactions. To explore different session-based recommendation solutions, Booking.com recently organized the WSDM WebTour 2021 Challenge, which aims to be… ▽ More Deep learning based methods have been used successfully in recommender system problems. Approaches using recurrent neural networks, transformers, and attention mechanisms are useful to model users' long- and short-term preferences in sequential interactions. To explore different session-based recommendation solutions, Booking.com recently organized the WSDM WebTour 2021 Challenge, which aims to benchmark models to recommend the final city in a trip. This study presents our approach to this challenge. We conducted several experiments to test different state-of-the-art deep learning architectures for recommender systems. Further, we proposed some changes to Neural Attentive Recommendation Machine (NARM), adapted its architecture for the challenge objective, and implemented training approaches that can be used in any session-based model to improve accuracy. Our experimental result shows that the improved NARM outperforms all other state-of-the-art benchmark methods. △ Less

Submitted 7 March, 2021; originally announced March 2021.

Comments: 5 pages, 2 figures, WSDM Workshop on Web Tourism 2021

ACM Class: I.2.1; H.4.2

Journal ref: ACM WSDM Workshop on Web Tourism (WSDM Webtour'21), March 12, 2021, Jerusalem, Israel

arXiv:2012.14352 [pdf, other]

Analysis of Dominant Classes in Universal Adversarial Perturbations

Authors: Jon Vadillo, Roberto Santana, Jose A. Lozano

Abstract: The reasons why Deep Neural Networks are susceptible to being fooled by adversarial examples remains an open discussion. Indeed, many different strategies can be employed to efficiently generate adversarial attacks, some of them relying on different theoretical justifications. Among these strategies, universal (input-agnostic) perturbations are of particular interest, due to their capability to fo… ▽ More The reasons why Deep Neural Networks are susceptible to being fooled by adversarial examples remains an open discussion. Indeed, many different strategies can be employed to efficiently generate adversarial attacks, some of them relying on different theoretical justifications. Among these strategies, universal (input-agnostic) perturbations are of particular interest, due to their capability to fool a network independently of the input in which the perturbation is applied. In this work, we investigate an intriguing phenomenon of universal perturbations, which has been reported previously in the literature, yet without a proven justification: universal perturbations change the predicted classes for most inputs into one particular (dominant) class, even if this behavior is not specified during the creation of the perturbation. In order to justify the cause of this phenomenon, we propose a number of hypotheses and experimentally test them using a speech command classification problem in the audio domain as a testbed. Our analyses reveal interesting properties of universal perturbations, suggest new methods to generate such attacks and provide an explanation of dominant classes, under both a geometric and a data-feature perspective. △ Less

Submitted 11 January, 2021; v1 submitted 28 December, 2020; originally announced December 2020.

Comments: 20 pages, 10 figures, 4 tables

arXiv:2010.07035 [pdf, other]

MARS-Gym: A Gym framework to model, train, and evaluate Recommender Systems for Marketplaces

Authors: Marlesson R. O. Santana, Luckeciano C. Melo, Fernando H. F. Camargo, Bruno Brandão, Anderson Soares, Renan M. Oliveira, Sandor Caetano

Abstract: Recommender Systems are especially challenging for marketplaces since they must maximize user satisfaction while maintaining the healthiness and fairness of such ecosystems. In this context, we observed a lack of resources to design, train, and evaluate agents that learn by interacting within these environments. For this matter, we propose MARS-Gym, an open-source framework to empower researchers… ▽ More Recommender Systems are especially challenging for marketplaces since they must maximize user satisfaction while maintaining the healthiness and fairness of such ecosystems. In this context, we observed a lack of resources to design, train, and evaluate agents that learn by interacting within these environments. For this matter, we propose MARS-Gym, an open-source framework to empower researchers and engineers to quickly build and evaluate Reinforcement Learning agents for recommendations in marketplaces. MARS-Gym addresses the whole development pipeline: data processing, model design and optimization, and multi-sided evaluation. We also provide the implementation of a diverse set of baseline agents, with a metrics-driven analysis of them in the Trivago marketplace dataset, to illustrate how to conduct a holistic assessment using the available metrics of recommendation, off-policy estimation, and fairness. With MARS-Gym, we expect to bridge the gap between academic research and production systems, as well as to facilitate the design of new algorithms and applications. △ Less

Submitted 30 September, 2020; originally announced October 2020.

Comments: 15 pages, 14 figures, see https://github.com/deeplearningbrasil/mars-gym

ACM Class: I.6.5; H.4.2

arXiv:2004.06383 [pdf, other]

Extending Adversarial Attacks to Produce Adversarial Class Probability Distributions

Authors: Jon Vadillo, Roberto Santana, Jose A. Lozano

Abstract: Despite the remarkable performance and generalization levels of deep learning models in a wide range of artificial intelligence tasks, it has been demonstrated that these models can be easily fooled by the addition of imperceptible yet malicious perturbations to natural inputs. These altered inputs are known in the literature as adversarial examples. In this paper, we propose a novel probabilistic… ▽ More Despite the remarkable performance and generalization levels of deep learning models in a wide range of artificial intelligence tasks, it has been demonstrated that these models can be easily fooled by the addition of imperceptible yet malicious perturbations to natural inputs. These altered inputs are known in the literature as adversarial examples. In this paper, we propose a novel probabilistic framework to generalize and extend adversarial attacks in order to produce a desired probability distribution for the classes when we apply the attack method to a large number of inputs. This novel attack paradigm provides the adversary with greater control over the target model, thereby exposing, in a wide range of scenarios, threats against deep learning models that cannot be conducted by the conventional paradigms. We introduce four different strategies to efficiently generate such attacks, and illustrate our approach by extending multiple adversarial attack algorithms. We also experimentally validate our approach for the spoken command classification task and the Tweet emotion classification task, two exemplary machine learning problems in the audio and text domain, respectively. Our results demonstrate that we can closely approximate any probability distribution for the classes while maintaining a high fooling rate and even prevent the attacks from being detected by label-shift detection methods. △ Less

Submitted 25 January, 2023; v1 submitted 14 April, 2020; originally announced April 2020.

Comments: Final version as accepted in JMLR. Attribution requirements are provided at http://jmlr.org/papers/v24/21-0326.html

Journal ref: Journal of Machine Learning Research, 24(15):1-42, 2023

arXiv:2001.08444 [pdf, other]

On the human evaluation of audio adversarial examples

Authors: Jon Vadillo, Roberto Santana

Abstract: Human-machine interaction is increasingly dependent on speech communication. Machine Learning models are usually applied to interpret human speech commands. However, these models can be fooled by adversarial examples, which are inputs intentionally perturbed to produce a wrong prediction without being noticed. While much research has been focused on develo** new techniques to generate adversaria… ▽ More Human-machine interaction is increasingly dependent on speech communication. Machine Learning models are usually applied to interpret human speech commands. However, these models can be fooled by adversarial examples, which are inputs intentionally perturbed to produce a wrong prediction without being noticed. While much research has been focused on develo** new techniques to generate adversarial perturbations, less attention has been given to aspects that determine whether and how the perturbations are noticed by humans. This question is relevant since high fooling rates of proposed adversarial perturbation strategies are only valuable if the perturbations are not detectable. In this paper we investigate to which extent the distortion metrics proposed in the literature for audio adversarial examples, and which are commonly applied to evaluate the effectiveness of methods for generating these attacks, are a reliable measure of the human perception of the perturbations. Using an analytical framework, and an experiment in which 18 subjects evaluate audio adversarial examples, we demonstrate that the metrics employed by convention are not a reliable measure of the perceptual similarity of adversarial examples in the audio domain. △ Less

Submitted 12 February, 2021; v1 submitted 23 January, 2020; originally announced January 2020.

Comments: Preprint. 17 pages, 7 figures, 4 tables

arXiv:1911.10182 [pdf, other]

Universal adversarial examples in speech command classification

Authors: Jon Vadillo, Roberto Santana

Abstract: Adversarial examples are inputs intentionally perturbed with the aim of forcing a machine learning model to produce a wrong prediction, while the changes are not easily detectable by a human. Although this topic has been intensively studied in the image domain, classification tasks in the audio domain have received less attention. In this paper we address the existence of universal perturbations f… ▽ More Adversarial examples are inputs intentionally perturbed with the aim of forcing a machine learning model to produce a wrong prediction, while the changes are not easily detectable by a human. Although this topic has been intensively studied in the image domain, classification tasks in the audio domain have received less attention. In this paper we address the existence of universal perturbations for speech command classification. We provide evidence that universal attacks can be generated for speech command classification tasks, which are able to generalize across different models to a significant extent. Additionally, a novel analytical framework is proposed for the evaluation of universal perturbations under different levels of universality, demonstrating that the feasibility of generating effective perturbations decreases as the universality level increases. Finally, we propose a more detailed and rigorous framework to measure the amount of distortion introduced by the perturbations, demonstrating that the methods employed by convention are not realistic in audio-based problems. △ Less

Submitted 13 February, 2021; v1 submitted 22 November, 2019; originally announced November 2019.

Comments: 14 pages, 2 figures, 4 tables; Revised external links

arXiv:1910.05173 [pdf, other]

Evolving Gaussian Process kernels from elementary mathematical expressions

Authors: Ibai Roman, Roberto Santana, Alexander Mendiburu, Jose A. Lozano

Abstract: Choosing the most adequate kernel is crucial in many Machine Learning applications. Gaussian Process is a state-of-the-art technique for regression and classification that heavily relies on a kernel function. However, in the Gaussian Process literature, kernels have usually been either ad hoc designed, selected from a predefined set, or searched for in a space of compositions of kernels which have… ▽ More Choosing the most adequate kernel is crucial in many Machine Learning applications. Gaussian Process is a state-of-the-art technique for regression and classification that heavily relies on a kernel function. However, in the Gaussian Process literature, kernels have usually been either ad hoc designed, selected from a predefined set, or searched for in a space of compositions of kernels which have been defined a priori. In this paper, we propose a Genetic-Programming algorithm that represents a kernel function as a tree of elementary mathematical expressions. By means of this representation, a wider set of kernels can be modeled, where potentially better solutions can be found, although new challenges also arise. The proposed algorithm is able to overcome these difficulties and find kernels that accurately model the characteristics of the data. This method has been tested in several real-world time-series extrapolation problems, improving the state-of-the-art results while reducing the complexity of the kernels. △ Less

Submitted 14 October, 2019; v1 submitted 11 October, 2019; originally announced October 2019.

arXiv:1909.06945 [pdf, other]

doi 10.1016/j.neucom.2020.09.076

Adversarial $α$-divergence Minimization for Bayesian Approximate Inference

Authors: Simón Rodríguez Santana, Daniel Hernández-Lobato

Abstract: Neural networks are popular state-of-the-art models for many different tasks.They are often trained via back-propagation to find a value of the weights that correctly predicts the observed data. Although back-propagation has shown good performance in many applications, it cannot easily output an estimate of the uncertainty in the predictions made. Estimating the uncertainty in the predictions is a… ▽ More Neural networks are popular state-of-the-art models for many different tasks.They are often trained via back-propagation to find a value of the weights that correctly predicts the observed data. Although back-propagation has shown good performance in many applications, it cannot easily output an estimate of the uncertainty in the predictions made. Estimating the uncertainty in the predictions is a critical aspect with important applications, and one method to obtain this information is following a Bayesian approach to estimate a posterior distribution on the model parameters. This posterior distribution summarizes which parameter values are compatible with the data, but is usually intractable and has to be approximated. Several mechanisms have been considered for solving this problem. We propose here a general method for approximate Bayesian inference that is based on minimizingα-divergences and that allows for flexible approximate distributions. The method is evaluated in the context of Bayesian neural networks on extensive experiments. The results show that, in regression problems, it often gives better performance in terms of the test log-likelihoodand sometimes in terms of the squared error. In classification problems, however, it gives competitive results. △ Less

Submitted 30 January, 2020; v1 submitted 13 September, 2019; originally announced September 2019.

Comments: 47 pages, 10 figures (41 pages for the main article, 6 for the supplementary material)

arXiv:1904.00977 [pdf, ps, other]

Sentiment analysis with genetically evolved Gaussian kernels

Authors: Ibai Roman, Alexander Mendiburu, Roberto Santana, Jose A. Lozano

Abstract: Sentiment analysis consists of evaluating opinions or statements from the analysis of text. Among the methods used to estimate the degree in which a text expresses a given sentiment, are those based on Gaussian Processes. However, traditional Gaussian Processes methods use a predefined kernel with hyperparameters that can be tuned but whose structure can not be adapted. In this paper, we propose t… ▽ More Sentiment analysis consists of evaluating opinions or statements from the analysis of text. Among the methods used to estimate the degree in which a text expresses a given sentiment, are those based on Gaussian Processes. However, traditional Gaussian Processes methods use a predefined kernel with hyperparameters that can be tuned but whose structure can not be adapted. In this paper, we propose the application of Genetic Programming for evolving Gaussian Process kernels that are more precise for sentiment analysis. We use use a very flexible representation of kernels combined with a multi-objective approach that simultaneously considers two quality metrics and the computational time spent by the kernels. Our results show that the algorithm can outperform Gaussian Processes with traditional kernels for some of the sentiment analysis tasks considered. △ Less

Submitted 14 October, 2019; v1 submitted 1 April, 2019; originally announced April 2019.

arXiv:1903.09171 [pdf, other]

Towards automatic construction of multi-network models for heterogeneous multi-task learning

Authors: Unai Garciarena, Alexander Mendiburu, Roberto Santana

Abstract: Multi-task learning, as it is understood nowadays, consists of using one single model to carry out several similar tasks. From classifying hand-written characters of different alphabets to figuring out how to play several Atari games using reinforcement learning, multi-task models have been able to widen their performance range across different tasks, although these tasks are usually of a similar… ▽ More Multi-task learning, as it is understood nowadays, consists of using one single model to carry out several similar tasks. From classifying hand-written characters of different alphabets to figuring out how to play several Atari games using reinforcement learning, multi-task models have been able to widen their performance range across different tasks, although these tasks are usually of a similar nature. In this work, we attempt to widen this range even further, by including heterogeneous tasks in a single learning procedure. To do so, we firstly formally define a multi-network model, identifying the necessary components and characteristics to allow different adaptations of said model depending on the tasks it is required to fulfill. Secondly, employing the formal definition as a starting point, we develop an illustrative model example consisting of three different tasks (classification, regression and data sampling). The performance of this model implementation is then analyzed, showing its capabilities. Motivated by the results of the analysis, we enumerate a set of open challenges and future research lines over which the full potential of the proposed model definition can be exploited. △ Less

Submitted 21 March, 2019; originally announced March 2019.

Comments: Preprint

MSC Class: 68T99 ACM Class: I.2.6

arXiv:1806.09935 [pdf, ps, other]

On the performance of multi-objective estimation of distribution algorithms for combinatorial problems

Authors: Marcella S. R. Martins, Mohamed El Yafrani, Roberto Santana, Myriam Delgado, Ricardo Lüders, Belaïd Ahiod

Abstract: Fitness landscape analysis investigates features with a high influence on the performance of optimization algorithms, aiming to take advantage of the addressed problem characteristics. In this work, a fitness landscape analysis using problem features is performed for a Multi-objective Bayesian Optimization Algorithm (mBOA) on instances of MNK-landscape problem for 2, 3, 5 and 8 objectives. We also… ▽ More Fitness landscape analysis investigates features with a high influence on the performance of optimization algorithms, aiming to take advantage of the addressed problem characteristics. In this work, a fitness landscape analysis using problem features is performed for a Multi-objective Bayesian Optimization Algorithm (mBOA) on instances of MNK-landscape problem for 2, 3, 5 and 8 objectives. We also compare the results of mBOA with those provided by NSGA-III through the analysis of their estimated runtime necessary to identify an approximation of the Pareto front. Moreover, in order to scrutinize the probabilistic graphic model obtained by mBOA, the Pareto front is examined according to a probabilistic view. The fitness landscape study shows that mBOA is moderately or loosely influenced by some problem features, according to a simple and a multiple linear regression model, which is being proposed to predict the algorithms performance in terms of the estimated runtime. Besides, we conclude that the analysis of the probabilistic graphic model produced at the end of evolution can be useful to understand the convergence and diversity performances of the proposed approach. △ Less

Submitted 4 June, 2018; originally announced June 2018.

Comments: Accepted in IEEE WCCI/CEC '2018

arXiv:1801.04407 [pdf, other]

Towards a more efficient representation of imputation operators in TPOT

Authors: Unai Garciarena, Alexander Mendiburu, Roberto Santana

Abstract: Automated Machine Learning encompasses a set of meta-algorithms intended to design and apply machine learning techniques (e.g., model selection, hyperparameter tuning, model assessment, etc.). TPOT, a software for optimizing machine learning pipelines based on genetic programming (GP), is a novel example of this kind of applications. Recently we have proposed a way to introduce imputation methods… ▽ More Automated Machine Learning encompasses a set of meta-algorithms intended to design and apply machine learning techniques (e.g., model selection, hyperparameter tuning, model assessment, etc.). TPOT, a software for optimizing machine learning pipelines based on genetic programming (GP), is a novel example of this kind of applications. Recently we have proposed a way to introduce imputation methods as part of TPOT. While our approach was able to deal with problems with missing data, it can produce a high number of unfeasible pipelines. In this paper we propose a strongly-typed-GP based approach that enforces constraint satisfaction by GP solutions. The enhancement we introduce is based on the redefinition of the operators and implicit enforcement of constraints in the generation of the GP trees. We evaluate the method to introduce imputation methods as part of TPOT. We show that the method can notably increase the efficiency of the GP search for optimal pipelines. △ Less

Submitted 13 January, 2018; originally announced January 2018.

Comments: 13 pages, 4 figures. Continuation of a previous work

MSC Class: 68T99 ACM Class: I.2.6

arXiv:1707.03093 [pdf, ps, other]

Gray-box optimization and factorized distribution algorithms: where two worlds collide

Authors: Roberto Santana

Abstract: The concept of gray-box optimization, in juxtaposition to black-box optimization, revolves about the idea of exploiting the problem structure to implement more efficient evolutionary algorithms (EAs). Work on factorized distribution algorithms (FDAs), whose factorizations are directly derived from the problem structure, has also contributed to show how exploiting the problem structure produces imp… ▽ More The concept of gray-box optimization, in juxtaposition to black-box optimization, revolves about the idea of exploiting the problem structure to implement more efficient evolutionary algorithms (EAs). Work on factorized distribution algorithms (FDAs), whose factorizations are directly derived from the problem structure, has also contributed to show how exploiting the problem structure produces important gains in the efficiency of EAs. In this paper we analyze the general question of using problem structure in EAs focusing on confronting work done in gray-box optimization with related research accomplished in FDAs. This contrasted analysis helps us to identify, in current studies on the use problem structure in EAs, two distinct analytical characterizations of how these algorithms work. Moreover, we claim that these two characterizations collide and compete at the time of providing a coherent framework to investigate this type of algorithms. To illustrate this claim, we present a contrasted analysis of formalisms, questions, and results produced in FDAs and gray-box optimization. Common underlying principles in the two approaches, which are usually overlooked, are identified and discussed. Besides, an extensive review of previous research related to different uses of the problem structure in EAs is presented. The paper also elaborates on some of the questions that arise when extending the use of problem structure in EAs, such as the question of evolvability, high cardinality of the variables and large definition sets, constrained and multi-objective problems, etc. Finally, emergent approaches that exploit neural models to capture the problem structure are covered. △ Less

Submitted 10 July, 2017; originally announced July 2017.

Comments: 33 pages, 9 tables, 3 figures. This paper covers some of the topics of the talk "When the gray box was opened, model-based evolutionary algorithms were already there" presented in the Model-Based Evolutionary Algorithms workshop on July 20, 2016, in Denver

arXiv:1706.01120 [pdf, ps, other]

Evolving imputation strategies for missing data in classification problems with TPOT

Authors: Unai Garciarena, Roberto Santana, Alexander Mendiburu

Abstract: Missing data has a ubiquitous presence in real-life applications of machine learning techniques. Imputation methods are algorithms conceived for restoring missing values in the data, based on other entries in the database. The choice of the imputation method has an influence on the performance of the machine learning technique, e.g., it influences the accuracy of the classification algorithm appli… ▽ More Missing data has a ubiquitous presence in real-life applications of machine learning techniques. Imputation methods are algorithms conceived for restoring missing values in the data, based on other entries in the database. The choice of the imputation method has an influence on the performance of the machine learning technique, e.g., it influences the accuracy of the classification algorithm applied to the data. Therefore, selecting and applying the right imputation method is important and usually requires a substantial amount of human intervention. In this paper we propose the use of genetic programming techniques to search for the right combination of imputation and classification algorithms. We build our work on the recently introduced Python-based TPOT library, and incorporate a heterogeneous set of imputation algorithms as part of the machine learning pipeline search. We show that genetic programming can automatically find increasingly better pipelines that include the most effective combinations of imputation methods, feature pre-processing, and classifiers for a variety of classification problems with missing data. △ Less

Submitted 14 August, 2017; v1 submitted 4 June, 2017; originally announced June 2017.

Comments: 15 pages, 4 figures

MSC Class: 65C99 ACM Class: D.2.2

arXiv:1702.05624 [pdf, ps, other]

Reproducing and learning new algebraic operations on word embeddings using genetic programming

Authors: Roberto Santana

Abstract: Word-vector representations associate a high dimensional real-vector to every word from a corpus. Recently, neural-network based methods have been proposed for learning this representation from large corpora. This type of word-to-vector embedding is able to keep, in the learned vector space, some of the syntactic and semantic relationships present in the original word corpus. This, in turn, serves… ▽ More Word-vector representations associate a high dimensional real-vector to every word from a corpus. Recently, neural-network based methods have been proposed for learning this representation from large corpora. This type of word-to-vector embedding is able to keep, in the learned vector space, some of the syntactic and semantic relationships present in the original word corpus. This, in turn, serves to address different types of language classification tasks by doing algebraic operations defined on the vectors. The general practice is to assume that the semantic relationships between the words can be inferred by the application of a-priori specified algebraic operations. Our general goal in this paper is to show that it is possible to learn methods for word composition in semantic spaces. Instead of expressing the compositional method as an algebraic operation, we will encode it as a program, which can be linear, nonlinear, or involve more intricate expressions. More remarkably, this program will be evolved from a set of initial random programs by means of genetic programming (GP). We show that our method is able to reproduce the same behavior as human-designed algebraic operators. Using a word analogy task as benchmark, we also show that GP-generated programs are able to obtain accuracy values above those produced by the commonly used human-designed rule for algebraic manipulation of word vectors. Finally, we show the robustness of our approach by executing the evolved programs on the word2vec GoogleNews vectors, learned over 3 billion running words, and assessing their accuracy in the same word analogy task. △ Less

Submitted 18 February, 2017; originally announced February 2017.

Comments: 17 pages, 7 tables, 8 figures. Python code available from https://github.com/rsantana-isg/GP_word2vec

arXiv:1608.05105 [pdf, ps, other]

Evolutionary Approaches to Optimization Problems in Chimera Topologies

Authors: Roberto Santana, Zheng Zhu, Helmut G. Katzgraber

Abstract: Chimera graphs define the topology of one of the first commercially available quantum computers. A variety of optimization problems have been mapped to this topology to evaluate the behavior of quantum enhanced optimization heuristics in relation to other optimizers, being able to efficiently solve problems classically to use them as benchmarks for quantum machines. In this paper we investigate fo… ▽ More Chimera graphs define the topology of one of the first commercially available quantum computers. A variety of optimization problems have been mapped to this topology to evaluate the behavior of quantum enhanced optimization heuristics in relation to other optimizers, being able to efficiently solve problems classically to use them as benchmarks for quantum machines. In this paper we investigate for the first time the use of Evolutionary Algorithms (EAs) on Ising spin glass instances defined on the Chimera topology. Three genetic algorithms (GAs) and three estimation of distribution algorithms (EDAs) are evaluated over $1000$ hard instances of the Ising spin glass constructed from Sidon sets. We focus on determining whether the information about the topology of the graph can be used to improve the results of EAs and on identifying the characteristics of the Ising instances that influence the success rate of GAs and EDAs. △ Less

Submitted 17 August, 2016; originally announced August 2016.

Comments: 8 pages, 5 figures, 3 tables

Journal ref: Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2016), ACM Press, 397-404 (2016)

arXiv:1512.03466 [pdf, ps, other]

Computing factorized approximations of Pareto-fronts using mNM-landscapes and Boltzmann distributions

Authors: Roberto Santana, Alexander Mendiburu, Jose A. Lozano

Abstract: NM-landscapes have been recently introduced as a class of tunable rugged models. They are a subset of the general interaction models where all the interactions are of order less or equal $M$. The Boltzmann distribution has been extensively applied in single-objective evolutionary algorithms to implement selection and study the theoretical properties of model-building algorithms. In this paper we p… ▽ More NM-landscapes have been recently introduced as a class of tunable rugged models. They are a subset of the general interaction models where all the interactions are of order less or equal $M$. The Boltzmann distribution has been extensively applied in single-objective evolutionary algorithms to implement selection and study the theoretical properties of model-building algorithms. In this paper we propose the combination of the multi-objective NM-landscape model and the Boltzmann distribution to obtain Pareto-front approximations. We investigate the joint effect of the parameters of the NM-landscapes and the probabilistic factorizations in the shape of the Pareto front approximations. △ Less

Submitted 10 December, 2015; originally announced December 2015.

Comments: Accepted for CAEPIA-2015 conference, Albacete, Spain. 11 pages, 3 figures

arXiv:1511.05625 [pdf, ps, other]

MOEA/D-GM: Using probabilistic graphical models in MOEA/D for solving combinatorial optimization problems

Authors: Murilo Zangari de Souza, Roberto Santana, Aurora Trinidad Ramirez Pozo, Alexander Mendiburu

Abstract: Evolutionary algorithms based on modeling the statistical dependencies (interactions) between the variables have been proposed to solve a wide range of complex problems. These algorithms learn and sample probabilistic graphical models able to encode and exploit the regularities of the problem. This paper investigates the effect of using probabilistic modeling techniques as a way to enhance the beh… ▽ More Evolutionary algorithms based on modeling the statistical dependencies (interactions) between the variables have been proposed to solve a wide range of complex problems. These algorithms learn and sample probabilistic graphical models able to encode and exploit the regularities of the problem. This paper investigates the effect of using probabilistic modeling techniques as a way to enhance the behavior of MOEA/D framework. MOEA/D is a decomposition based evolutionary algorithm that decomposes a multi-objective optimization problem (MOP) in a number of scalar single-objective subproblems and optimizes them in a collaborative manner. MOEA/D framework has been widely used to solve several MOPs. The proposed algorithm, MOEA/D using probabilistic Graphical Models (MOEA/D-GM) is able to instantiate both univariate and multi-variate probabilistic models for each subproblem. To validate the introduced framework algorithm, an experimental study is conducted on a multi-objective version of the deceptive function Trap5. The results show that the variant of the framework (MOEA/D-Tree), where tree models are learned from the matrices of the mutual information between the variables, is able to capture the structure of the problem. MOEA/D-Tree is able to achieve significantly better results than both MOEA/D using genetic operators and MOEA/D using univariate probability models, in terms of the approximation to the true Pareto front. △ Less

Submitted 17 November, 2015; originally announced November 2015.

Comments: 13 pages, 4 figures

arXiv:1410.0602 [pdf, ps, other]

doi 10.1007/978-3-319-13563-2_2

A probabilistic evolutionary optimization approach to compute quasiparticle braids

Authors: Roberto Santana, Ross B. McDonald, Helmut G. Katzgraber

Abstract: Topological quantum computing is an alternative framework for avoiding the quantum decoherence problem in quantum computation. The problem of executing a gate in this framework can be posed as the problem of braiding quasiparticles. Because these are not Abelian, the problem can be reduced to finding an optimal product of braid generators where the optimality is defined in terms of the gate approx… ▽ More Topological quantum computing is an alternative framework for avoiding the quantum decoherence problem in quantum computation. The problem of executing a gate in this framework can be posed as the problem of braiding quasiparticles. Because these are not Abelian, the problem can be reduced to finding an optimal product of braid generators where the optimality is defined in terms of the gate approximation and the braid's length. In this paper we propose the use of different variants of estimation of distribution algorithms to deal with the problem. Furthermore, we investigate how the regularities of the braid optimization problem can be translated into statistical regularities by means of the Boltzmann distribution. We show that our best algorithm is able to produce many solutions that approximates the target gate with an accuracy in the order of $10^{-6}$, and have lengths up to 9 times shorter than those expected from braids of the same accuracy obtained with other methods. △ Less

Submitted 2 October, 2014; originally announced October 2014.

Comments: 9 pages,7 figures. Accepted at SEAL 2014

Journal ref: Simulated Evolution and Learning, Lecture Notes in Computer Science 8886, 13 (2014)

Showing 1–44 of 44 results for author: Santana, R