-
Overcoming Catastrophic Forgetting in Tabular Data Classification: A Pseudorehearsal-based approach
Authors:
Pablo García-Santaclara,
Bruno Fernández-Castro,
Rebeca P. Díaz-Redondo
Abstract:
Continual learning (CL) poses the important challenge of adapting to evolving data distributions without forgetting previously acquired knowledge while consolidating new knowledge. In this paper, we introduce a new methodology, coined as Tabular-data Rehearsal-based Incremental Lifelong Learning framework (TRIL3), designed to address the phenomenon of catastrophic forgetting in tabular data classi…
▽ More
Continual learning (CL) poses the important challenge of adapting to evolving data distributions without forgetting previously acquired knowledge while consolidating new knowledge. In this paper, we introduce a new methodology, coined as Tabular-data Rehearsal-based Incremental Lifelong Learning framework (TRIL3), designed to address the phenomenon of catastrophic forgetting in tabular data classification problems. TRIL3 uses the prototype-based incremental generative model XuILVQ to generate synthetic data to preserve old knowledge and the DNDF algorithm, which was modified to run in an incremental way, to learn classification tasks for tabular data, without storing old samples. After different tests to obtain the adequate percentage of synthetic data and to compare TRIL3 with other CL available proposals, we can conclude that the performance of TRIL3 outstands other options in the literature using only 50% of synthetic data.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
Generation of BIM data based on the automatic detection, identification and localization of lamps in buildings
Authors:
Francisco Troncoso-Pastoriza,
Pablo Eguía-Oller,
Rebeca P. Díaz-Redondo,
Enrique Granada-Álvarez
Abstract:
In this paper we introduce a method that supports the detection, identification and localization of lamps in a building, with the main goal of automatically feeding its energy model by means of Building Information Modeling (BIM) methods. The proposed method, thus, provides useful information to apply energy-saving strategies to reduce energy consumption in the building sector through the correct…
▽ More
In this paper we introduce a method that supports the detection, identification and localization of lamps in a building, with the main goal of automatically feeding its energy model by means of Building Information Modeling (BIM) methods. The proposed method, thus, provides useful information to apply energy-saving strategies to reduce energy consumption in the building sector through the correct management of the lighting infrastructure. Based on the unique geometry and brightness of lamps and the use of only greyscale images, our methodology is able to obtain accurate results despite its low computational needs, resulting in near-real-time processing. The main novelty is that the focus of the candidate search is not over the entire image but instead only on a limited region that summarizes the specific characteristics of the lamp. The information obtained from our approach was used on the Green Building XML Schema to illustrate the automatic generation of BIM data from the results of the algorithm.
△ Less
Submitted 18 December, 2023;
originally announced January 2024.
-
Deep Learning-based Sentiment Classification: A Comparative Survey
Authors:
Mohamed Kayed,
Rebeca P. Díaz-Redondo,
Alhassan Mabrouk
Abstract:
Recently, Deep Learning (DL) approaches have been applied to solve the Sentiment Classification (SC) problem, which is a core task in reviews mining or Sentiment Analysis (SA). The performances of these approaches are affected by different factors. This paper addresses these factors and classifies them into three categories: data preparation based factors, feature representation based factors and…
▽ More
Recently, Deep Learning (DL) approaches have been applied to solve the Sentiment Classification (SC) problem, which is a core task in reviews mining or Sentiment Analysis (SA). The performances of these approaches are affected by different factors. This paper addresses these factors and classifies them into three categories: data preparation based factors, feature representation based factors and the classification techniques based factors. The paper is a comprehensive literature-based survey that compares the performance of more than 100 DL-based SC approaches by using 21 public datasets of reviews given by customers within three specific application domains (products, movies and restaurants). These 21 datasets have different characteristics (balanced/imbalanced, size, etc.) to give a global vision for our study. The comparison explains how the proposed factors quantitatively affect the performance of the studied DL-based SC approaches.
△ Less
Submitted 12 December, 2023;
originally announced December 2023.
-
SEOpinion: Summarization and Exploration Opinion of E-Commerce Websites
Authors:
Alhassan Mabrouk,
Rebeca P. Díaz-Redondo,
Mohammed Kayed
Abstract:
E-Commerce (EC) websites provide a large amount of useful information that exceed human cognitive processing ability. In order to help customers in comparing alternatives when buying a product, previous studies designed opinion summarization systems based on customer reviews. They ignored templates' information provided by manufacturers, although these descriptive information have much product asp…
▽ More
E-Commerce (EC) websites provide a large amount of useful information that exceed human cognitive processing ability. In order to help customers in comparing alternatives when buying a product, previous studies designed opinion summarization systems based on customer reviews. They ignored templates' information provided by manufacturers, although these descriptive information have much product aspects or characteristics. Therefore, this paper proposes a methodology coined as SEOpinion (Summa-rization and Exploration of Opinions) which provides a summary for the product aspects and spots opinion(s) regarding them, using a combination of templates' information with the customer reviews in two main phases. First, the Hierarchical Aspect Extraction (HAE) phase creates a hierarchy of product aspects from the template. Subsequently, the Hierarchical Aspect-based Opinion Summarization (HAOS) phase enriches this hierarchy with customers' opinions; to be shown to other potential buyers. To test the feasibility of using Deep Learning-based BERT techniques with our approach, we have created a corpus by gathering information from the top five EC websites for laptops. The experimental results show that Recurrent Neural Network (RNN) achieves better results (77.4% and 82.6% in terms of F1-measure for the first and second phase) than the Convolutional Neural Network (CNN) and the Support Vector Machine (SVM) technique.
△ Less
Submitted 12 December, 2023;
originally announced December 2023.
-
Decentralised and collaborative machine learning framework for IoT
Authors:
Martín González-Soto,
Rebeca P. Díaz-Redondo,
Manuel Fernández-Veiga,
Bruno Rodríguez-Castro,
Ana Fernández-Vilas
Abstract:
Decentralised machine learning has recently been proposed as a potential solution to the security issues of the canonical federated learning approach. In this paper, we propose a decentralised and collaborative machine learning framework specially oriented to resource-constrained devices, usual in IoT deployments. With this aim we propose the following construction blocks. First, an incremental le…
▽ More
Decentralised machine learning has recently been proposed as a potential solution to the security issues of the canonical federated learning approach. In this paper, we propose a decentralised and collaborative machine learning framework specially oriented to resource-constrained devices, usual in IoT deployments. With this aim we propose the following construction blocks. First, an incremental learning algorithm based on prototypes that was specifically implemented to work in low-performance computing elements. Second, two random-based protocols to exchange the local models among the computing elements in the network. Finally, two algorithmics approaches for prediction and prototype creation. This proposal was compared to a typical centralized incremental learning approach in terms of accuracy, training time and robustness with very promising results.
△ Less
Submitted 19 December, 2023;
originally announced December 2023.
-
Orientation-Constrained System for Lamp Detection in Buildings Based on Computer Vision
Authors:
Francisco Troncoso-Pastoriza,
Pablo Eguía-Oller,
Rebeca P. Díaz-Redondo,
Enrique Granada-Álvarez,
Aitor Erkoreka
Abstract:
Computer vision is used in this work to detect lighting elements in buildings with the goal of improving the accuracy of previous methods to provide a precise inventory of the location and state of lamps. Using the framework developed in our previous works, we introduce two new modifications to enhance the system: first, a constraint on the orientation of the detected poses in the optimization met…
▽ More
Computer vision is used in this work to detect lighting elements in buildings with the goal of improving the accuracy of previous methods to provide a precise inventory of the location and state of lamps. Using the framework developed in our previous works, we introduce two new modifications to enhance the system: first, a constraint on the orientation of the detected poses in the optimization methods for both the initial and the refined estimates based on the geometric information of the building information modelling (BIM) model; second, an additional reprojection error filtering step to discard the erroneous poses introduced with the orientation restrictions, kee** the identification and localization errors low while greatly increasing the number of detections. These~enhancements are tested in five different case studies with more than 30,000 images, with results showing improvements in the number of detections, the percentage of correct model and state identifications, and the distance between detections and reference positions
△ Less
Submitted 18 December, 2023;
originally announced December 2023.
-
Use of BIM Data as Input and Output for Improved Detection of Lighting Elements in Buildings
Authors:
Francisco Troncoso-Pastoriza,
Pablo Eguía-Oller,
Rebeca P. Díaz-Redondo,
Enrique Granada-Álvarez
Abstract:
This paper introduces a complete method for the automatic detection, identification and localization of lighting elements in buildings, leveraging the available building information modeling (BIM) data of a building and feeding the BIM model with the new collected information, which is key for energy-saving strategies. The detection system is heavily improved from our previous work, with the follo…
▽ More
This paper introduces a complete method for the automatic detection, identification and localization of lighting elements in buildings, leveraging the available building information modeling (BIM) data of a building and feeding the BIM model with the new collected information, which is key for energy-saving strategies. The detection system is heavily improved from our previous work, with the following two main contributions: (i) a new refinement algorithm to provide a better detection rate and identification performance with comparable computational resources and (ii) a new plane estimation, filtering and projection step to leverage the BIM information earlier for lamps that are both hanging and embedded. The two modifications are thoroughly tested in five different case studies, yielding better results in terms of detection, identification and localization.
△ Less
Submitted 18 December, 2023;
originally announced December 2023.
-
Discovering Geo-dependent Stories by Combining Density-based Clustering and Thread-based Aggregation techniques
Authors:
Héctor Cerezo-Costas,
Ana Fernández Vilas,
Manuela Martín-Vicente,
Rebeca P. Díaz-Redondo
Abstract:
Citizens are actively interacting with their surroundings, especially through social media. Not only do shared posts give important information about what is happening (from the users' perspective), but also the metadata linked to these posts offer relevant data, such as the GPS-location in Location-based Social Networks (LBSNs). In this paper we introduce a global analysis of the geo-tagged posts…
▽ More
Citizens are actively interacting with their surroundings, especially through social media. Not only do shared posts give important information about what is happening (from the users' perspective), but also the metadata linked to these posts offer relevant data, such as the GPS-location in Location-based Social Networks (LBSNs). In this paper we introduce a global analysis of the geo-tagged posts in social media which supports (i) the detection of unexpected behavior in the city and (ii) the analysis of the posts to infer what is happening. The former is obtained by applying density-based clustering techniques, whereas the latter is consequence of applying natural language processing. We have applied our methodology to a dataset obtained from Instagram activity in New York City for seven months obtaining promising results. The developed algorithms require very low resources, being able to analyze millions of data-points in commodity hardware in less than one hour without applying complex parallelization techniques. Furthermore, the solution can be easily adapted to other geo-tagged data sources without extra effort.
△ Less
Submitted 18 December, 2023;
originally announced December 2023.
-
Resource Allocation for Dataflow Applications in FANETs using Anypath Routing
Authors:
Juan José López Escobar,
Manuel Ricardo,
Rui Campos,
Felipe Gil-Castiñeira,
Rebeca P. Díaz-Redondo
Abstract:
Management of network resources in advanced IoT applications is a challenging topic due to their distributed nature from the Edge to the Cloud, and the heavy demand of real-time data from many sources to take action in the deployment. FANETs (Flying Ad-hoc Networks) are a clear example of heterogeneous multi-modal use cases, which require strict quality in the network communications, as well as th…
▽ More
Management of network resources in advanced IoT applications is a challenging topic due to their distributed nature from the Edge to the Cloud, and the heavy demand of real-time data from many sources to take action in the deployment. FANETs (Flying Ad-hoc Networks) are a clear example of heterogeneous multi-modal use cases, which require strict quality in the network communications, as well as the coordination of the computing capabilities, in order to operate correctly the final service. In this paper, we present a Virtual Network Embedding (VNE) framework designed for the allocation of dataflow applications, composed of nano-services that produce or consume data, in a wireless infrastructure, such as an airborne network. To address the problem, an anypath-based heuristic algorithm that considers the quality demand of the communication between nano-services is proposed, coined as Quality-Revenue Paired Anypath Dataflow VNE (QRPAD-VNE). We also provide a simulation environment for the evaluation of its performance according to the virtual network (VN) request load in the system. Finally, we show the suitability of a multi-parameter framework in conjunction with anypath routing in order to have better performance results that guarantee minimum quality in the wireless communications.
△ Less
Submitted 12 December, 2023;
originally announced December 2023.
-
JMAC Protocol: A Cross-Layer Multi-Hop Protocol for LoRa
Authors:
Juan José López Escobar,
Felipe Gil-Castiñeira,
Rebeca P. Díaz-Redondo
Abstract:
The emergence of Low-Power Wide-Area Network (LPWAN) technologies allowed the development of revolutionary Internet Of Things (IoT) applications covering large areas with thousands of devices. However, connectivity may be a challenge for non-line-of-sight indoor operation or for areas without good coverage. Technologies such as LoRa and Sigfox allow connectivity for up to 50,000 devices per cell,…
▽ More
The emergence of Low-Power Wide-Area Network (LPWAN) technologies allowed the development of revolutionary Internet Of Things (IoT) applications covering large areas with thousands of devices. However, connectivity may be a challenge for non-line-of-sight indoor operation or for areas without good coverage. Technologies such as LoRa and Sigfox allow connectivity for up to 50,000 devices per cell, several devices that may be exceeded in many scenarios. To deal with these problems, this paper introduces a new multi-hop protocol, called JMAC, designed for improving long range wireless communication networks that may support monitoring in scenarios such smart cities or Industry 4.0. JMAC uses the LoRa radio technology to keep low consumption and extend coverage area, and exploits the potential mesh behaviour of wireless networks to improve coverage and increase the number of supported devices per cell. \mbox{JMAC is} based on predictive wake-up to reach long lifetime on sensor devices. Our proposal was validated using the OMNeT++ simulator to analyze how it performs under different conditions with promising results
△ Less
Submitted 12 December, 2023;
originally announced December 2023.
-
A hybrid analysis of LBSN data to early detect anomalies in crowd dynamics
Authors:
Rebeca P. Díaz-Redondo,
Carlos Garcia-Rubio,
Ana Fernández Vilas,
Celeste Campo,
Alicia Rodriguez-Carrion
Abstract:
Undoubtedly, Location-based Social Networks (LBSNs) provide an interesting source of geo-located data that we have previously used to obtain patterns of the dynamics of crowds throughout urban areas. According to our previous results, activity in LBSNs reflects the real activity in the city. Therefore, unexpected behaviors in the social media activity are a trustful evidence of unexpected changes…
▽ More
Undoubtedly, Location-based Social Networks (LBSNs) provide an interesting source of geo-located data that we have previously used to obtain patterns of the dynamics of crowds throughout urban areas. According to our previous results, activity in LBSNs reflects the real activity in the city. Therefore, unexpected behaviors in the social media activity are a trustful evidence of unexpected changes of the activity in the city. In this paper we introduce a hybrid solution to early detect these changes based on applying a combination of two approaches, the use of entropy analysis and clustering techniques, on the data gathered from LBSNs. In particular, we have performed our experiments over a data set collected from Instagram for seven months in New York City, obtaining promising results.
△ Less
Submitted 13 December, 2023;
originally announced December 2023.
-
Unsupervised KPIs-Based Clustering of Jobs in HPC Data Centers
Authors:
Mohamed S. Halawa,
Rebeca P. Díaz-Redondo,
Ana Fernández-Vilas
Abstract:
Performance analysis is an essential task in High-Performance Computing (HPC) systems and it is applied for different purposes such as anomaly detection, optimal resource allocation, and budget planning. HPC monitoring tasks generate a huge number of Key Performance Indicators (KPIs) to supervise the status of the jobs running in these systems. KPIs give data about CPU usage, memory usage, network…
▽ More
Performance analysis is an essential task in High-Performance Computing (HPC) systems and it is applied for different purposes such as anomaly detection, optimal resource allocation, and budget planning. HPC monitoring tasks generate a huge number of Key Performance Indicators (KPIs) to supervise the status of the jobs running in these systems. KPIs give data about CPU usage, memory usage, network (interface) traffic, or other sensors that monitor the hardware. Analyzing this data, it is possible to obtain insightful information about running jobs, such as their characteristics, performance, and failures. The main contribution in this paper is to identify which metric/s (KPIs) is/are the most appropriate to identify/classify different types of jobs according to their behavior in the HPC system. With this aim, we have applied different clustering techniques (partition and hierarchical clustering algorithms) using a real dataset from the Galician Computation Center (CESGA). We have concluded that (i) those metrics (KPIs) related to the Network (interface) traffic monitoring provide the best cohesion and separation to cluster HPC jobs, and (ii) hierarchical clustering algorithms are the most suitable for this task. Our approach was validated using a different real dataset from the same HPC center.
△ Less
Submitted 11 December, 2023;
originally announced December 2023.
-
KPIs-Based Clustering and Visualization of HPC jobs: a Feature Reduction Approach
Authors:
Mohamed Soliman Halawa,
Rebeca P. Díaz-Redondo,
Ana Fernández-Vilas
Abstract:
High-Performance Computing (HPC) systems need to be constantly monitored to ensure their stability. The monitoring systems collect a tremendous amount of data about different parameters or Key Performance Indicators (KPIs), such as resource usage, IO waiting time, etc. A proper analysis of this data, usually stored as time series, can provide insight in choosing the right management strategies as…
▽ More
High-Performance Computing (HPC) systems need to be constantly monitored to ensure their stability. The monitoring systems collect a tremendous amount of data about different parameters or Key Performance Indicators (KPIs), such as resource usage, IO waiting time, etc. A proper analysis of this data, usually stored as time series, can provide insight in choosing the right management strategies as well as the early detection of issues. In this paper, we introduce a methodology to cluster HPC jobs according to their KPI indicators. Our approach reduces the inherent high dimensionality of the collected data by applying two techniques to the time series: literature-based and variance-based feature extraction. We also define a procedure to visualize the obtained clusters by combining the two previous approaches and the Principal Component Analysis (PCA). Finally, we have validated our contributions on a real data set to conclude that those KPIs related to CPU usage provide the best cohesion and separation for clustering analysis and the good results of our visualization methodology.
△ Less
Submitted 11 December, 2023;
originally announced December 2023.
-
Irregular Repetition Slotted Aloha with Multipacket Detection: A Density Evolution Analysis
Authors:
Manuel Fernández-Veiga,
M. E. Sousa-Vieira,
Ana Fernández-Vilas,
Rebeca P Díaz-Redondo
Abstract:
Irregular repetition slotted Aloha (IRSA) has shown significant advantages as a modern technique for uncoordinated random access with massive number of users due to its capability of achieving theoretically a throughput of $1$ packet per slot. When the receiver has also the multi-packet reception of multi-user (MUD) detection property, by applying successive interference cancellation, IRSA also ob…
▽ More
Irregular repetition slotted Aloha (IRSA) has shown significant advantages as a modern technique for uncoordinated random access with massive number of users due to its capability of achieving theoretically a throughput of $1$ packet per slot. When the receiver has also the multi-packet reception of multi-user (MUD) detection property, by applying successive interference cancellation, IRSA also obtains very low packet loss probabilities at low traffic loads, but is unable in general to achieve a normalized throughput close to the $1$. In this paper, we reconsider the case of IRSA with $k$-MUD receivers and derive the general density evolution equations for the non-asymptotic analysis of the packet loss rate, for arbitrary frame lengths and two variants of the first slot used for transmission. Next, using the potential function, we give new capacity bounds on the capacity of the system, showing the threshold arrival rate for zero decoding error probability. Our numerical results illustrate performance in terms of throughput and average delay for $k$-MUD IRSA with finite memory at the receiver, and also with bounded maximum delay.
△ Less
Submitted 11 December, 2023;
originally announced December 2023.
-
Integrating micro-learning content in traditional e-learning platforms
Authors:
Rebeca P. Díaz-Redondo,
Manuel Caeiro-Rodríguez,
Juan José López-Escobar,
Ana Fernández-Vilas
Abstract:
Lifelong learning requires appropriate solutions, especially for corporate training. Workers usually have difficulty combining training and their normal work. In this context, micro-learning emerges as a suitable solution, since it is based on breaking down new concepts into small fragments or pills of content, which can be consumed in short periods of time. The purpose of this paper is twofold. F…
▽ More
Lifelong learning requires appropriate solutions, especially for corporate training. Workers usually have difficulty combining training and their normal work. In this context, micro-learning emerges as a suitable solution, since it is based on breaking down new concepts into small fragments or pills of content, which can be consumed in short periods of time. The purpose of this paper is twofold. First, we offer an updated overview of the research on this training paradigm, as well as the different technologies leading to potential commercial solutions. Second, we introduce a proposal to add micro-learning content to more formal distance learning environments (traditional Learning Management Systems or LMS), with the aim of taking advantage of both learning philosophies. Our approach is based on a Service-Oriented Architecture (SOA) that is deployed in the cloud. In order to ensure the full integration of the micro-learning approach in traditional LMSs, we have used two well-known standards in the distance learning field: LTI (Learning Tools Interoperability) and LIS (Learning Information Service). The combination of these two technologies allows the exchange of data with the LMS to monitor the student's activity and results. Finally, we have collected the opinion of lectures from different countries in order to know their thoughts about the potential of this new approach in higher education, obtaining positive feedback.
△ Less
Submitted 11 December, 2023;
originally announced December 2023.
-
A Blockchain Solution for Collaborative Machine Learning over IoT
Authors:
Carlos Beis-Penedo,
Francisco Troncoso-Pastoriza,
Rebeca P. Díaz-Redondo,
Ana Fernández-Vilas,
Manuel Fernández-Veiga,
Martín González Soto
Abstract:
The rapid growth of Internet of Things (IoT) devices and applications has led to an increased demand for advanced analytics and machine learning techniques capable of handling the challenges associated with data privacy, security, and scalability. Federated learning (FL) and blockchain technologies have emerged as promising approaches to address these challenges by enabling decentralized, secure,…
▽ More
The rapid growth of Internet of Things (IoT) devices and applications has led to an increased demand for advanced analytics and machine learning techniques capable of handling the challenges associated with data privacy, security, and scalability. Federated learning (FL) and blockchain technologies have emerged as promising approaches to address these challenges by enabling decentralized, secure, and privacy-preserving model training on distributed data sources. In this paper, we present a novel IoT solution that combines the incremental learning vector quantization algorithm (XuILVQ) with Ethereum blockchain technology to facilitate secure and efficient data sharing, model training, and prototype storage in a distributed environment. Our proposed architecture addresses the shortcomings of existing blockchain-based FL solutions by reducing computational and communication overheads while maintaining data privacy and security. We assess the performance of our system through a series of experiments, showcasing its potential to enhance the accuracy and efficiency of machine learning tasks in IoT settings.
△ Less
Submitted 23 November, 2023;
originally announced November 2023.