-
Intelligent Client Selection for Federated Learning using Cellular Automata
Authors:
Nikolaos Pavlidis,
Vasileios Perifanis,
Theodoros Panagiotis Chatzinikolaou,
Georgios Ch. Sirakoulis,
Pavlos S. Efraimidis
Abstract:
Federated Learning (FL) has emerged as a promising solution for privacy-enhancement and latency minimization in various real-world applications, such as transportation, communications, and healthcare. FL endeavors to bring Machine Learning (ML) down to the edge by harnessing data from million of devices and IoT sensors, thus enabling rapid responses to dynamic environments and yielding highly pers…
▽ More
Federated Learning (FL) has emerged as a promising solution for privacy-enhancement and latency minimization in various real-world applications, such as transportation, communications, and healthcare. FL endeavors to bring Machine Learning (ML) down to the edge by harnessing data from million of devices and IoT sensors, thus enabling rapid responses to dynamic environments and yielding highly personalized results. However, the increased amount of sensors across diverse applications poses challenges in terms of communication and resource allocation, hindering the participation of all devices in the federated process and prompting the need for effective FL client selection. To address this issue, we propose Cellular Automaton-based Client Selection (CA-CS), a novel client selection algorithm, which leverages Cellular Automata (CA) as models to effectively capture spatio-temporal changes in a fast-evolving environment. CA-CS considers the computational resources and communication capacity of each participating client, while also accounting for inter-client interactions between neighbors during the client selection process, enabling intelligent client selection for online FL processes on data streams that closely resemble real-world scenarios. In this paper, we present a thorough evaluation of the proposed CA-CS algorithm using MNIST and CIFAR-10 datasets, while making a direct comparison against a uniformly random client selection scheme. Our results demonstrate that CA-CS achieves comparable accuracy to the random selection approach, while effectively avoiding high-latency clients.
△ Less
Submitted 18 October, 2023; v1 submitted 1 October, 2023;
originally announced October 2023.
-
Towards Energy-Aware Federated Traffic Prediction for Cellular Networks
Authors:
Vasileios Perifanis,
Nikolaos Pavlidis,
Selim F. Yilmaz,
Francesc Wilhelmi,
Elia Guerra,
Marco Miozzo,
Pavlos S. Efraimidis,
Paolo Dini,
Remous-Aris Koutsiamanis
Abstract:
Cellular traffic prediction is a crucial activity for optimizing networks in fifth-generation (5G) networks and beyond, as accurate forecasting is essential for intelligent network design, resource allocation and anomaly mitigation. Although machine learning (ML) is a promising approach to effectively predict network traffic, the centralization of massive data in a single data center raises issues…
▽ More
Cellular traffic prediction is a crucial activity for optimizing networks in fifth-generation (5G) networks and beyond, as accurate forecasting is essential for intelligent network design, resource allocation and anomaly mitigation. Although machine learning (ML) is a promising approach to effectively predict network traffic, the centralization of massive data in a single data center raises issues regarding confidentiality, privacy and data transfer demands. To address these challenges, federated learning (FL) emerges as an appealing ML training framework which offers high accurate predictions through parallel distributed computations. However, the environmental impact of these methods is often overlooked, which calls into question their sustainability. In this paper, we address the trade-off between accuracy and energy consumption in FL by proposing a novel sustainability indicator that allows assessing the feasibility of ML models. Then, we comprehensively evaluate state-of-the-art deep learning (DL) architectures in a federated scenario using real-world measurements from base station (BS) sites in the area of Barcelona, Spain. Our findings indicate that larger ML models achieve marginally improved performance but have a significant environmental impact in terms of carbon footprint, which make them impractical for real-world applications.
△ Less
Submitted 19 September, 2023;
originally announced September 2023.
-
Federated Learning for Early Dropout Prediction on Healthy Ageing Applications
Authors:
Christos Chrysanthos Nikolaidis,
Vasileios Perifanis,
Nikolaos Pavlidis,
Pavlos S. Efraimidis
Abstract:
The provision of social care applications is crucial for elderly people to improve their quality of life and enables operators to provide early interventions. Accurate predictions of user dropouts in healthy ageing applications are essential since they are directly related to individual health statuses. Machine Learning (ML) algorithms have enabled highly accurate predictions, outperforming tradit…
▽ More
The provision of social care applications is crucial for elderly people to improve their quality of life and enables operators to provide early interventions. Accurate predictions of user dropouts in healthy ageing applications are essential since they are directly related to individual health statuses. Machine Learning (ML) algorithms have enabled highly accurate predictions, outperforming traditional statistical methods that struggle to cope with individual patterns. However, ML requires a substantial amount of data for training, which is challenging due to the presence of personal identifiable information (PII) and the fragmentation posed by regulations. In this paper, we present a federated machine learning (FML) approach that minimizes privacy concerns and enables distributed training, without transferring individual data. We employ collaborative training by considering individuals and organizations under FML, which models both cross-device and cross-silo learning scenarios. Our approach is evaluated on a real-world dataset with non-independent and identically distributed (non-iid) data among clients, class imbalance and label ambiguity. Our results show that data selection and class imbalance handling techniques significantly improve the predictive accuracy of models trained under FML, demonstrating comparable or superior predictive performance than traditional ML models.
△ Less
Submitted 8 September, 2023;
originally announced September 2023.
-
Predicting Early Dropouts of an Active and Healthy Ageing App
Authors:
Vasileios Perifanis,
Ioanna Michailidi,
Giorgos Stamatelatos,
George Drosatos,
Pavlos S. Efraimidis
Abstract:
In this work, we present a machine learning approach for predicting early dropouts of an active and healthy ageing app. The presented algorithms have been submitted to the IFMBE Scientific Challenge 2022, part of IUPESM WC 2022. We have processed the given database and generated seven datasets. We used pre-processing techniques to construct classification models that predict the adherence of users…
▽ More
In this work, we present a machine learning approach for predicting early dropouts of an active and healthy ageing app. The presented algorithms have been submitted to the IFMBE Scientific Challenge 2022, part of IUPESM WC 2022. We have processed the given database and generated seven datasets. We used pre-processing techniques to construct classification models that predict the adherence of users using dynamic and static features. We submitted 11 official runs and our results show that machine learning algorithms can provide high-quality adherence predictions. Based on the results, the dynamic features positively influence a model's classification performance. Due to the imbalanced nature of the dataset, we employed oversampling methods such as SMOTE and ADASYN to improve the classification performance. The oversampling approaches led to a remarkable improvement of 10\%. Our methods won first place in the IFMBE Scientific Challenge 2022.
△ Less
Submitted 1 August, 2023;
originally announced August 2023.
-
Federated Learning for 5G Base Station Traffic Forecasting
Authors:
Vasileios Perifanis,
Nikolaos Pavlidis,
Remous-Aris Koutsiamanis,
Pavlos S. Efraimidis
Abstract:
Cellular traffic prediction is of great importance on the path of enabling 5G mobile networks to perform intelligent and efficient infrastructure planning and management. However, available data are limited to base station logging information. Hence, training methods for generating high-quality predictions that can generalize to new observations across diverse parties are in demand. Traditional ap…
▽ More
Cellular traffic prediction is of great importance on the path of enabling 5G mobile networks to perform intelligent and efficient infrastructure planning and management. However, available data are limited to base station logging information. Hence, training methods for generating high-quality predictions that can generalize to new observations across diverse parties are in demand. Traditional approaches require collecting measurements from multiple base stations, transmitting them to a central entity and conducting machine learning operations using the acquire data. The dissemination of local observations raises concerns regarding confidentiality and performance, which impede the applicability of machine learning techniques. Although various distributed learning methods have been proposed to address this issue, their application to traffic prediction remains highly unexplored. In this work, we investigate the efficacy of federated learning applied to raw base station LTE data for time-series forecasting. We evaluate one-step predictions using five different neural network architectures trained with a federated setting on non-identically distributed data. Our results show that the learning architectures adapted to the federated setting yield equivalent prediction error to the centralized setting. In addition, preprocessing techniques on base stations enhance forecasting accuracy, while advanced federated aggregators do not surpass simpler approaches. Simulations considering the environmental impact suggest that federated learning holds the potential for reducing carbon emissions and energy consumption. Finally, we consider a large-scale scenario with synthetic data and demonstrate that federated learning reduces the computational and communication costs compared to centralized settings.
△ Less
Submitted 26 August, 2023; v1 submitted 28 November, 2022;
originally announced November 2022.
-
FedPOIRec: Privacy Preserving Federated POI Recommendation with Social Influence
Authors:
Vasileios Perifanis,
George Drosatos,
Giorgos Stamatelatos,
Pavlos S. Efraimidis
Abstract:
With the growing number of Location-Based Social Networks, privacy preserving location prediction has become a primary task for hel** users discover new points-of-interest (POIs). Traditional systems consider a centralized approach that requires the transmission and collection of users' private data. In this work, we present FedPOIRec, a privacy preserving federated learning approach enhanced wi…
▽ More
With the growing number of Location-Based Social Networks, privacy preserving location prediction has become a primary task for hel** users discover new points-of-interest (POIs). Traditional systems consider a centralized approach that requires the transmission and collection of users' private data. In this work, we present FedPOIRec, a privacy preserving federated learning approach enhanced with features from users' social circles for top-$N$ POI recommendations. First, the FedPOIRec framework is built on the principle that local data never leave the owner's device, while the local updates are blindly aggregated by a parameter server. Second, the local recommenders get personalized by allowing users to exchange their learned parameters, enabling knowledge transfer among friends. To this end, we propose a privacy preserving protocol for integrating the preferences of a user's friends after the federated computation, by exploiting the properties of the CKKS fully homomorphic encryption scheme. To evaluate FedPOIRec, we apply our approach into five real-world datasets using two recommendation models. Extensive experiments demonstrate that FedPOIRec achieves comparable recommendation quality to centralized approaches, while the social integration protocol incurs low computation and communication overhead on the user side.
△ Less
Submitted 21 December, 2021;
originally announced December 2021.
-
An Exact, Linear Time Barabási-Albert Algorithm
Authors:
Giorgos Stamatelatos,
Pavlos S. Efraimidis
Abstract:
This paper presents the development of a new class of algorithms that accurately implement the preferential attachment mechanism of the Barabási-Albert (BA) model to generate scale-free graphs. Contrary to existing approximate preferential attachment schemes, our methods are exact in terms of the proportionality of the vertex selection probabilities to their degree and run in linear time with resp…
▽ More
This paper presents the development of a new class of algorithms that accurately implement the preferential attachment mechanism of the Barabási-Albert (BA) model to generate scale-free graphs. Contrary to existing approximate preferential attachment schemes, our methods are exact in terms of the proportionality of the vertex selection probabilities to their degree and run in linear time with respect to the order of the generated graph. Our algorithms utilize a series of precise, diverse, weighted and unweighted random sampling steps to engineer the desired properties of the graph generator. We analytically show that they obey the definition of the original BA model that generates scale-free graphs and discuss their higher-order properties. The proposed methods additionally include options to manipulate one dimension of control over the joint inclusion of groups of vertices.
△ Less
Submitted 30 March, 2022; v1 submitted 1 October, 2021;
originally announced October 2021.
-
Federated Neural Collaborative Filtering
Authors:
Vasileios Perifanis,
Pavlos S. Efraimidis
Abstract:
In this work, we present a federated version of the state-of-the-art Neural Collaborative Filtering (NCF) approach for item recommendations. The system, named FedNCF, enables learning without requiring users to disclose or transmit their raw data. Data localization preserves data privacy and complies with regulations such as the GDPR. Although federated learning enables model training without loca…
▽ More
In this work, we present a federated version of the state-of-the-art Neural Collaborative Filtering (NCF) approach for item recommendations. The system, named FedNCF, enables learning without requiring users to disclose or transmit their raw data. Data localization preserves data privacy and complies with regulations such as the GDPR. Although federated learning enables model training without local data dissemination, the transmission of raw clients' updates raises additional privacy issues. To address this challenge, we incorporate a privacy-preserving aggregation method that satisfies the security requirements against an honest but curious entity. We argue theoretically and experimentally that existing aggregation algorithms are inconsistent with latent factor model updates. We propose an enhancement by decomposing the aggregation step into matrix factorization and neural network-based averaging. Experimental validation shows that FedNCF achieves comparable recommendation quality to the original NCF system, while our proposed aggregation leads to faster convergence compared to existing methods. We investigate the effectiveness of the federated recommender system and evaluate the privacy-preserving mechanism in terms of computational cost.
△ Less
Submitted 16 February, 2022; v1 submitted 2 June, 2021;
originally announced June 2021.
-
Exploring Ethereum's Data Stores: A Cost and Performance Comparison
Authors:
P. Kostamis,
A. Sendros,
P. S. Efraimidis
Abstract:
The cost of using a blockchain infrastructure as well as the time required to search and retrieve information from it must be considered when designing a decentralized application. In this work, we examine a comprehensive set of data management approaches for Ethereum applications and assess the associated cost in gas as well as the retrieval performance. More precisely, we analyze the storage and…
▽ More
The cost of using a blockchain infrastructure as well as the time required to search and retrieve information from it must be considered when designing a decentralized application. In this work, we examine a comprehensive set of data management approaches for Ethereum applications and assess the associated cost in gas as well as the retrieval performance. More precisely, we analyze the storage and retrieval of various-sized data, utilizing smart contract storage. In addition, we study hybrid approaches by using IPFS and Swarm as storage platforms along with Ethereum as a timestam** proof mechanism. Such schemes are especially effective when large chunks of data have to be managed. Moreover, we present methods for low-cost data handling in Ethereum, namely the event-logs, the transaction payload, and the almost surprising exploitation of unused function arguments. Finally, we evaluate these methods on a comprehensive set of experiments.
△ Less
Submitted 21 May, 2021;
originally announced May 2021.
-
Lexicographic Enumeration of Set Partitions
Authors:
Giorgos Stamatelatos,
Pavlos S. Efraimidis
Abstract:
In this report, we summarize the set partition enumeration problems and thoroughly explain the algorithms used to solve them. These algorithms iterate through the partitions in lexicographic order and are easy to understand and implement in modern high-level programming languages, without recursive structures and jump logic. We show that they require linear space in respect to the set cardinality…
▽ More
In this report, we summarize the set partition enumeration problems and thoroughly explain the algorithms used to solve them. These algorithms iterate through the partitions in lexicographic order and are easy to understand and implement in modern high-level programming languages, without recursive structures and jump logic. We show that they require linear space in respect to the set cardinality and advance the enumeration in constant amortized time. The methods discussed in this document are not novel. Our goal is to demonstrate the process of enumerating set partitions and highlight the ideas behind it. This work is an aid for learners approaching this enumeration problem and programmers undertaking the task of implementing it.
△ Less
Submitted 16 May, 2021;
originally announced May 2021.
-
About Weighted Random Sampling in Preferential Attachment Models
Authors:
Giorgos Stamatelatos,
Pavlos S. Efraimidis
Abstract:
The Barabási-Albert model is a popular scheme for creating scale-free graphs but has been previously shown to have ambiguities in its definition. In this paper we discuss a new ambiguity in the definition of the BA model by identifying the tight relation between the preferential attachment process and unequal probability random sampling. While the probability that each individual vertex is selecte…
▽ More
The Barabási-Albert model is a popular scheme for creating scale-free graphs but has been previously shown to have ambiguities in its definition. In this paper we discuss a new ambiguity in the definition of the BA model by identifying the tight relation between the preferential attachment process and unequal probability random sampling. While the probability that each individual vertex is selected is set to be proportional to their degree, the model does not specify the joint probabilities that any tuple of $m$ vertices is selected together for $m>1$. We demonstrate the consequences using analytical, experimental, and empirical analyses and propose a concise definition of the model that addresses this ambiguity. Using the connection with unequal probability random sampling, we also highlight a confusion about the process via which nodes are selected on each time step, for which - despite being implicitly indicated in the original paper - current literature appears fragmented.
△ Less
Submitted 13 October, 2021; v1 submitted 16 February, 2021;
originally announced February 2021.
-
On Money as a Means of Coordination between Network Packets
Authors:
Pavlos S. Efraimidis,
Remous-Aris Koutsiamanis
Abstract:
In this work, we apply a common economic tool, namely money, to coordinate network packets. In particular, we present a network economy, called PacketEconomy, where each flow is modeled as a population of rational network packets, and these packets can self-regulate their access to network resources by mutually trading their positions in router queues. Every packet of the economy has its price, an…
▽ More
In this work, we apply a common economic tool, namely money, to coordinate network packets. In particular, we present a network economy, called PacketEconomy, where each flow is modeled as a population of rational network packets, and these packets can self-regulate their access to network resources by mutually trading their positions in router queues. Every packet of the economy has its price, and this price determines if and when the packet will agree to buy or sell a better position. We consider a corresponding Markov model of trade and show that there are Nash equilibria (NE) where queue positions and money are exchanged directly between the network packets. This simple approach, interestingly, delivers improvements even when fiat money is used. We present theoretical arguments and experimental results to support our claims.
△ Less
Submitted 18 August, 2012;
originally announced August 2012.
-
An exact and O(1) time heaviest and lightest hitters algorithm for sliding-window data streams
Authors:
Remous-Aris Koutsiamanis,
Pavlos S. Efraimidis
Abstract:
In this work we focus on the problem of finding the heaviest-k and lightest-k hitters in a sliding window data stream. The most recent research endeavours have yielded an epsilon-approximate algorithm with update operations in constant time with high probability and O(1/epsilon) query time for the heaviest hitters case. We propose a novel algorithm which for the first time, to our knowledge, provi…
▽ More
In this work we focus on the problem of finding the heaviest-k and lightest-k hitters in a sliding window data stream. The most recent research endeavours have yielded an epsilon-approximate algorithm with update operations in constant time with high probability and O(1/epsilon) query time for the heaviest hitters case. We propose a novel algorithm which for the first time, to our knowledge, provides exact, not approximate, results while at the same time achieves O(1) time with high probability complexity on both update and query operations. Furthermore, our algorithm is able to provide both the heaviest-k and the lightest-k hitters at the same time without any overhead. In this work, we describe the algorithm and the accompanying data structure that supports it and perform quantitative experiments with synthetic data to verify our theoretical predictions.
△ Less
Submitted 1 March, 2011;
originally announced March 2011.
-
(α, β) Fibonacci Search
Authors:
Pavlos S. Efraimidis
Abstract:
Knuth [12, Page 417] states that "the (program of the) Fibonaccian search technique looks very mysterious at first glance" and that "it seems to work by magic". In this work, we show that there is even more magic in Fibonaccian (or else Fibonacci) search. We present a generalized Fibonacci procedure that follows perfectly the implicit optimal decision tree for search problems where the cost of eac…
▽ More
Knuth [12, Page 417] states that "the (program of the) Fibonaccian search technique looks very mysterious at first glance" and that "it seems to work by magic". In this work, we show that there is even more magic in Fibonaccian (or else Fibonacci) search. We present a generalized Fibonacci procedure that follows perfectly the implicit optimal decision tree for search problems where the cost of each comparison depends on its outcome.
△ Less
Submitted 1 December, 2010;
originally announced December 2010.
-
Weighted Random Sampling over Data Streams
Authors:
Pavlos S. Efraimidis
Abstract:
In this work, we present a comprehensive treatment of weighted random sampling (WRS) over data streams. More precisely, we examine two natural interpretations of the item weights, describe an existing algorithm for each case ([2, 4]), discuss sampling with and without replacement and show adaptations of the algorithms for several WRS problems and evolving data streams.
In this work, we present a comprehensive treatment of weighted random sampling (WRS) over data streams. More precisely, we examine two natural interpretations of the item weights, describe an existing algorithm for each case ([2, 4]), discuss sampling with and without replacement and show adaptations of the algorithms for several WRS problems and evolving data streams.
△ Less
Submitted 28 July, 2015; v1 submitted 1 December, 2010;
originally announced December 2010.