-
Semantic-Aware Resource Allocation in Constrained Networks with Limited User Participation
Authors:
Ouiame Marnissi,
Hajar EL Hammouti,
El Houcine Bergou
Abstract:
Semantic communication has gained attention as a key enabler for intelligent and context-aware communication. However, one of the key challenges of semantic communications is the need to tailor the resource allocation to meet the specific requirements of semantic transmission. In this paper, we focus on networks with limited resources where devices are constrained to transmit with limited bandwidt…
▽ More
Semantic communication has gained attention as a key enabler for intelligent and context-aware communication. However, one of the key challenges of semantic communications is the need to tailor the resource allocation to meet the specific requirements of semantic transmission. In this paper, we focus on networks with limited resources where devices are constrained to transmit with limited bandwidth and power over large distance. Specifically, we devise an efficient strategy to select the most pertinent semantic features and participating users, taking into account the channel quality, the transmission time, and the recovery accuracy. To this end, we formulate an optimization problem with the goal of selecting the most relevant and accurate semantic features over devices while satisfying constraints on transmission time and quality of the channel. This involves optimizing communication resources, identifying participating users, and choosing specific semantic information for transmission. The underlying problem is inherently complex due to its non-convex nature and combinatorial constraints. To overcome this challenge, we efficiently approximate the optimal solution by solving a series of integer linear programming problems. Our numerical findings illustrate the effectiveness and efficiency of our approach in managing semantic communications in networks with limited resources.
△ Less
Submitted 19 January, 2024;
originally announced January 2024.
-
Joint Probability Selection and Power Allocation for Federated Learning
Authors:
Ouiame Marnissi,
Hajar EL Hammouti,
El Houcine Bergou
Abstract:
In this paper, we study the performance of federated learning over wireless networks, where devices with a limited energy budget train a machine learning model. The federated learning performance depends on the selection of the clients participating in the learning at each round. Most existing studies suggest deterministic approaches for the client selection, resulting in challenging optimization…
▽ More
In this paper, we study the performance of federated learning over wireless networks, where devices with a limited energy budget train a machine learning model. The federated learning performance depends on the selection of the clients participating in the learning at each round. Most existing studies suggest deterministic approaches for the client selection, resulting in challenging optimization problems that are usually solved using heuristics, and therefore without guarantees on the quality of the final solution. We formulate a new probabilistic approach to jointly select clients and allocate power optimally so that the expected number of participating clients is maximized. To solve the problem, a new alternating algorithm is proposed, where at each step, the closed-form solutions for user selection probabilities and power allocations are obtained. Our numerical results show that the proposed approach achieves a significant performance in terms of energy consumption, completion time and accuracy as compared to the studied benchmarks.
△ Less
Submitted 15 January, 2024;
originally announced January 2024.
-
Demystifying the Myths and Legends of Nonconvex Convergence of SGD
Authors:
Aritra Dutta,
El Houcine Bergou,
Soumia Boucherouite,
Nicklas Werge,
Melih Kandemir,
Xin Li
Abstract:
Stochastic gradient descent (SGD) and its variants are the main workhorses for solving large-scale optimization problems with nonconvex objective functions. Although the convergence of SGDs in the (strongly) convex case is well-understood, their convergence for nonconvex functions stands on weak mathematical foundations. Most existing studies on the nonconvex convergence of SGD show the complexity…
▽ More
Stochastic gradient descent (SGD) and its variants are the main workhorses for solving large-scale optimization problems with nonconvex objective functions. Although the convergence of SGDs in the (strongly) convex case is well-understood, their convergence for nonconvex functions stands on weak mathematical foundations. Most existing studies on the nonconvex convergence of SGD show the complexity results based on either the minimum of the expected gradient norm or the functional sub-optimality gap (for functions with extra structural property) by searching the entire range of iterates. Hence the last iterations of SGDs do not necessarily maintain the same complexity guarantee. This paper shows that an $ε$-stationary point exists in the final iterates of SGDs, given a large enough total iteration budget, $T$, not just anywhere in the entire range of iterates -- a much stronger result than the existing one. Additionally, our analyses allow us to measure the density of the $ε$-stationary points in the final iterates of SGD, and we recover the classical $O(\frac{1}{\sqrt{T}})$ asymptotic rate under various existing assumptions on the objective function and the bounds on the stochastic gradient. As a result of our analyses, we addressed certain myths and legends related to the nonconvex convergence of SGD and posed some thought-provoking questions that could set new directions for research.
△ Less
Submitted 19 October, 2023;
originally announced October 2023.
-
Ensemble DNN for Age-of-Information Minimization in UAV-assisted Networks
Authors:
Mouhamed Naby Ndiaye,
El Houcine Bergou,
Hajar El Hammouti
Abstract:
This paper addresses the problem of Age-of-Information (AoI) in UAV-assisted networks. Our objective is to minimize the expected AoI across devices by optimizing UAVs' stop** locations and device selection probabilities. To tackle this problem, we first derive a closed-form expression of the expected AoI that involves the probabilities of selection of devices. Then, we formulate the problem as a…
▽ More
This paper addresses the problem of Age-of-Information (AoI) in UAV-assisted networks. Our objective is to minimize the expected AoI across devices by optimizing UAVs' stop** locations and device selection probabilities. To tackle this problem, we first derive a closed-form expression of the expected AoI that involves the probabilities of selection of devices. Then, we formulate the problem as a non-convex minimization subject to quality of service constraints. Since the problem is challenging to solve, we propose an Ensemble Deep Neural Network (EDNN) based approach which takes advantage of the dual formulation of the studied problem. Specifically, the Deep Neural Networks (DNNs) in the ensemble are trained in an unsupervised manner using the Lagrangian function of the studied problem. Our experiments show that the proposed EDNN method outperforms traditional DNNs in reducing the expected AoI, achieving a remarkable reduction of $29.5\%$.
△ Less
Submitted 6 September, 2023;
originally announced September 2023.
-
A Note on Randomized Kaczmarz Algorithm for Solving Doubly-Noisy Linear Systems
Authors:
El Houcine Bergou,
Soumia Boucherouite,
Aritra Dutta,
Xin Li,
Anna Ma
Abstract:
Large-scale linear systems, $Ax=b$, frequently arise in practice and demand effective iterative solvers. Often, these systems are noisy due to operational errors or faulty data-collection processes. In the past decade, the randomized Kaczmarz (RK) algorithm has been studied extensively as an efficient iterative solver for such systems. However, the convergence study of RK in the noisy regime is li…
▽ More
Large-scale linear systems, $Ax=b$, frequently arise in practice and demand effective iterative solvers. Often, these systems are noisy due to operational errors or faulty data-collection processes. In the past decade, the randomized Kaczmarz (RK) algorithm has been studied extensively as an efficient iterative solver for such systems. However, the convergence study of RK in the noisy regime is limited and considers measurement noise in the right-hand side vector, $b$. Unfortunately, in practice, that is not always the case; the coefficient matrix $A$ can also be noisy. In this paper, we analyze the convergence of RK for noisy linear systems when the coefficient matrix, $A$, is corrupted with both additive and multiplicative noise, along with the noisy vector, $b$. In our analyses, the quantity $\tilde R=\| \tilde A^{\dagger} \|_2^2 \|\tilde A \|_F^2$ influences the convergence of RK, where $\tilde A$ represents a noisy version of $A$. We claim that our analysis is robust and realistically applicable, as we do not require information about the noiseless coefficient matrix, $A$, and considering different conditions on noise, we can control the convergence of RK. We substantiate our theoretical findings by performing comprehensive numerical experiments.
△ Less
Submitted 31 August, 2023;
originally announced August 2023.
-
Muti-Agent Proximal Policy Optimization For Data Freshness in UAV-assisted Networks
Authors:
Mouhamed Naby Ndiaye,
El Houcine Bergou,
Hajar El Hammouti
Abstract:
Unmanned aerial vehicles (UAVs) are seen as a promising technology to perform a wide range of tasks in wireless communication networks. In this work, we consider the deployment of a group of UAVs to collect the data generated by IoT devices. Specifically, we focus on the case where the collected data is time-sensitive, and it is critical to maintain its timeliness. Our objective is to optimally de…
▽ More
Unmanned aerial vehicles (UAVs) are seen as a promising technology to perform a wide range of tasks in wireless communication networks. In this work, we consider the deployment of a group of UAVs to collect the data generated by IoT devices. Specifically, we focus on the case where the collected data is time-sensitive, and it is critical to maintain its timeliness. Our objective is to optimally design the UAVs' trajectories and the subsets of visited IoT devices such as the global Age-of-Updates (AoU) is minimized. To this end, we formulate the studied problem as a mixed-integer nonlinear programming (MINLP) under time and quality of service constraints. To efficiently solve the resulting optimization problem, we investigate the cooperative Multi-Agent Reinforcement Learning (MARL) framework and propose an RL approach based on the popular on-policy Reinforcement Learning (RL) algorithm: Policy Proximal Optimization (PPO). Our approach leverages the centralized training decentralized execution (CTDE) framework where the UAVs learn their optimal policies while training a centralized value function. Our simulation results show that the proposed MAPPO approach reduces the global AoU by at least a factor of 1/2 compared to conventional off-policy reinforcement learning approaches.
△ Less
Submitted 15 March, 2023;
originally announced March 2023.
-
Linear Scalarization for Byzantine-robust learning on non-IID data
Authors:
Latifa Errami,
El Houcine Bergou
Abstract:
In this work we study the problem of Byzantine-robust learning when data among clients is heterogeneous. We focus on poisoning attacks targeting the convergence of SGD. Although this problem has received great attention; the main Byzantine defenses rely on the IID assumption causing them to fail when data distribution is non-IID even with no attack. We propose the use of Linear Scalarization (LS)…
▽ More
In this work we study the problem of Byzantine-robust learning when data among clients is heterogeneous. We focus on poisoning attacks targeting the convergence of SGD. Although this problem has received great attention; the main Byzantine defenses rely on the IID assumption causing them to fail when data distribution is non-IID even with no attack. We propose the use of Linear Scalarization (LS) as an enhancing method to enable current defenses to circumvent Byzantine attacks in the non-IID setting. The LS method is based on the incorporation of a trade-off vector that penalizes the suspected malicious clients. Empirical analysis corroborates that the proposed LS variants are viable in the IID setting. For mild to strong non-IID data splits, LS is either comparable or outperforming current approaches under state-of-the-art Byzantine attack scenarios.
△ Less
Submitted 15 October, 2022;
originally announced October 2022.
-
Minibatch Stochastic Three Points Method for Unconstrained Smooth Minimization
Authors:
Soumia Boucherouite,
Grigory Malinovsky,
Peter Richtárik,
EL Houcine Bergou
Abstract:
In this paper, we propose a new zero order optimization method called minibatch stochastic three points (MiSTP) method to solve an unconstrained minimization problem in a setting where only an approximation of the objective function evaluation is possible. It is based on the recently proposed stochastic three points (STP) method (Bergou et al., 2020). At each iteration, MiSTP generates a random se…
▽ More
In this paper, we propose a new zero order optimization method called minibatch stochastic three points (MiSTP) method to solve an unconstrained minimization problem in a setting where only an approximation of the objective function evaluation is possible. It is based on the recently proposed stochastic three points (STP) method (Bergou et al., 2020). At each iteration, MiSTP generates a random search direction in a similar manner to STP, but chooses the next iterate based solely on the approximation of the objective function rather than its exact evaluations. We also analyze our method's complexity in the nonconvex and convex cases and evaluate its performance on multiple machine learning tasks.
△ Less
Submitted 16 September, 2022;
originally announced September 2022.
-
Personalized Federated Learning with Communication Compression
Authors:
El Houcine Bergou,
Konstantin Burlachenko,
Aritra Dutta,
Peter Richtárik
Abstract:
In contrast to training traditional machine learning (ML) models in data centers, federated learning (FL) trains ML models over local datasets contained on resource-constrained heterogeneous edge devices. Existing FL algorithms aim to learn a single global model for all participating devices, which may not be helpful to all devices participating in the training due to the heterogeneity of the data…
▽ More
In contrast to training traditional machine learning (ML) models in data centers, federated learning (FL) trains ML models over local datasets contained on resource-constrained heterogeneous edge devices. Existing FL algorithms aim to learn a single global model for all participating devices, which may not be helpful to all devices participating in the training due to the heterogeneity of the data across the devices. Recently, Hanzely and Richtárik (2020) proposed a new formulation for training personalized FL models aimed at balancing the trade-off between the traditional global model and the local models that could be trained by individual devices using their private data only. They derived a new algorithm, called Loopless Gradient Descent (L2GD), to solve it and showed that this algorithms leads to improved communication complexity guarantees in regimes when more personalization is required. In this paper, we equip their L2GD algorithm with a bidirectional compression mechanism to further reduce the communication bottleneck between the local devices and the server. Unlike other compression-based algorithms used in the FL-setting, our compressed L2GD algorithm operates on a probabilistic communication protocol, where communication does not happen on a fixed schedule. Moreover, our compressed L2GD algorithm maintains a similar convergence rate as vanilla SGD without compression. To empirically validate the efficiency of our algorithm, we perform diverse numerical experiments on both convex and non-convex problems and using various compression techniques.
△ Less
Submitted 12 September, 2022;
originally announced September 2022.
-
Client Selection in Federated Learning based on Gradients Importance
Authors:
Ouiame Marnissi,
Hajar El Hammouti,
El Houcine Bergou
Abstract:
Federated learning (FL) enables multiple devices to collaboratively learn a global model without sharing their personal data. In real-world applications, the different parties are likely to have heterogeneous data distribution and limited communication bandwidth. In this paper, we are interested in improving the communication efficiency of FL systems. We investigate and design a device selection s…
▽ More
Federated learning (FL) enables multiple devices to collaboratively learn a global model without sharing their personal data. In real-world applications, the different parties are likely to have heterogeneous data distribution and limited communication bandwidth. In this paper, we are interested in improving the communication efficiency of FL systems. We investigate and design a device selection strategy based on the importance of the gradient norms. In particular, our approach consists of selecting devices with the highest norms of gradient values at each communication round. We study the convergence and the performance of such a selection technique and compare it to existing ones. We perform several experiments with non-iid set-up. The results show the convergence of our method with a considerable increase of test accuracy comparing to the random selection.
△ Less
Submitted 19 November, 2021;
originally announced November 2021.
-
On the Discrepancy between the Theoretical Analysis and Practical Implementations of Compressed Communication for Distributed Deep Learning
Authors:
Aritra Dutta,
El Houcine Bergou,
Ahmed M. Abdelmoniem,
Chen-Yu Ho,
Atal Narayan Sahu,
Marco Canini,
Panos Kalnis
Abstract:
Compressed communication, in the form of sparsification or quantization of stochastic gradients, is employed to reduce communication costs in distributed data-parallel training of deep neural networks. However, there exists a discrepancy between theory and practice: while theoretical analysis of most existing compression methods assumes compression is applied to the gradients of the entire model,…
▽ More
Compressed communication, in the form of sparsification or quantization of stochastic gradients, is employed to reduce communication costs in distributed data-parallel training of deep neural networks. However, there exists a discrepancy between theory and practice: while theoretical analysis of most existing compression methods assumes compression is applied to the gradients of the entire model, many practical implementations operate individually on the gradients of each layer of the model. In this paper, we prove that layer-wise compression is, in theory, better, because the convergence rate is upper bounded by that of entire-model compression for a wide range of biased and unbiased compression methods. However, despite the theoretical bound, our experimental study of six well-known methods shows that convergence, in practice, may or may not be better, depending on the actual trained model and compression ratio. Our findings suggest that it would be advantageous for deep learning frameworks to include support for both layer-wise and entire-model compression.
△ Less
Submitted 19 November, 2019;
originally announced November 2019.
-
Direct Nonlinear Acceleration
Authors:
Aritra Dutta,
El Houcine Bergou,
Yunming Xiao,
Marco Canini,
Peter Richtárik
Abstract:
Optimization acceleration techniques such as momentum play a key role in state-of-the-art machine learning algorithms. Recently, generic vector sequence extrapolation techniques, such as regularized nonlinear acceleration (RNA) of Scieur et al., were proposed and shown to accelerate fixed point iterations. In contrast to RNA which computes extrapolation coefficients by (approximately) setting the…
▽ More
Optimization acceleration techniques such as momentum play a key role in state-of-the-art machine learning algorithms. Recently, generic vector sequence extrapolation techniques, such as regularized nonlinear acceleration (RNA) of Scieur et al., were proposed and shown to accelerate fixed point iterations. In contrast to RNA which computes extrapolation coefficients by (approximately) setting the gradient of the objective function to zero at the extrapolated point, we propose a more direct approach, which we call direct nonlinear acceleration (DNA). In DNA, we aim to minimize (an approximation of) the function value at the extrapolated point instead. We adopt a regularized approach with regularizers designed to prevent the model from entering a region in which the functional approximation is less precise. While the computational cost of DNA is comparable to that of RNA, our direct approach significantly outperforms RNA on both synthetic and real-world datasets. While the focus of this paper is on convex problems, we obtain very encouraging results in accelerating the training of neural networks.
△ Less
Submitted 28 May, 2019;
originally announced May 2019.