Search | arXiv e-print repository

Semantic-Aware Resource Allocation in Constrained Networks with Limited User Participation

Authors: Ouiame Marnissi, Hajar EL Hammouti, El Houcine Bergou

Abstract: Semantic communication has gained attention as a key enabler for intelligent and context-aware communication. However, one of the key challenges of semantic communications is the need to tailor the resource allocation to meet the specific requirements of semantic transmission. In this paper, we focus on networks with limited resources where devices are constrained to transmit with limited bandwidt… ▽ More Semantic communication has gained attention as a key enabler for intelligent and context-aware communication. However, one of the key challenges of semantic communications is the need to tailor the resource allocation to meet the specific requirements of semantic transmission. In this paper, we focus on networks with limited resources where devices are constrained to transmit with limited bandwidth and power over large distance. Specifically, we devise an efficient strategy to select the most pertinent semantic features and participating users, taking into account the channel quality, the transmission time, and the recovery accuracy. To this end, we formulate an optimization problem with the goal of selecting the most relevant and accurate semantic features over devices while satisfying constraints on transmission time and quality of the channel. This involves optimizing communication resources, identifying participating users, and choosing specific semantic information for transmission. The underlying problem is inherently complex due to its non-convex nature and combinatorial constraints. To overcome this challenge, we efficiently approximate the optimal solution by solving a series of integer linear programming problems. Our numerical findings illustrate the effectiveness and efficiency of our approach in managing semantic communications in networks with limited resources. △ Less

Submitted 19 January, 2024; originally announced January 2024.

arXiv:2401.07756 [pdf, other]

Joint Probability Selection and Power Allocation for Federated Learning

Authors: Ouiame Marnissi, Hajar EL Hammouti, El Houcine Bergou

Abstract: In this paper, we study the performance of federated learning over wireless networks, where devices with a limited energy budget train a machine learning model. The federated learning performance depends on the selection of the clients participating in the learning at each round. Most existing studies suggest deterministic approaches for the client selection, resulting in challenging optimization… ▽ More In this paper, we study the performance of federated learning over wireless networks, where devices with a limited energy budget train a machine learning model. The federated learning performance depends on the selection of the clients participating in the learning at each round. Most existing studies suggest deterministic approaches for the client selection, resulting in challenging optimization problems that are usually solved using heuristics, and therefore without guarantees on the quality of the final solution. We formulate a new probabilistic approach to jointly select clients and allocate power optimally so that the expected number of participating clients is maximized. To solve the problem, a new alternating algorithm is proposed, where at each step, the closed-form solutions for user selection probabilities and power allocations are obtained. Our numerical results show that the proposed approach achieves a significant performance in terms of energy consumption, completion time and accuracy as compared to the studied benchmarks. △ Less

Submitted 15 January, 2024; originally announced January 2024.

arXiv:2312.15778 [pdf, other]

Age-of-Information in UAV-assisted Networks: a Decentralized Multi-Agent Optimization

Authors: Mouhamed Naby Ndiaye, El Houcine Bergou, Hajar El Hammouti

Abstract: Unmanned aerial vehicles (UAVs) are a highly promising technology with diverse applications in wireless networks. One of their primary uses is the collection of time-sensitive data from Internet of Things (IoT) devices. In UAV-assisted networks, the Age-of-Information (AoI) serves as a fundamental metric for quantifying data timeliness and freshness. In this work, we are interested in a generalize… ▽ More Unmanned aerial vehicles (UAVs) are a highly promising technology with diverse applications in wireless networks. One of their primary uses is the collection of time-sensitive data from Internet of Things (IoT) devices. In UAV-assisted networks, the Age-of-Information (AoI) serves as a fundamental metric for quantifying data timeliness and freshness. In this work, we are interested in a generalized AoI formulation, where each packet's age is weighted based on its generation time. Our objective is to find the optimal UAVs' trajectories and the subsets of selected devices such that the weighted AoI is minimized. To address this challenge, we formulate the problem as a Mixed-Integer Nonlinear Programming (MINLP), incorporating time and quality of service constraints. To efficiently tackle this complex problem and minimize communication overhead among UAVs, we propose a distributed approach. This approach enables drones to make independent decisions based on locally acquired data. Specifically, we reformulate our problem such that our objective function is easily decomposed into individual rewards. The reformulated problem is solved using a distributed implementation of Multi-Agent Reinforcement Learning (MARL). Our empirical results show that the proposed decentralized approach achieves results that are nearly equivalent to a centralized implementation with a notable reduction in communication overhead. △ Less

Submitted 25 December, 2023; originally announced December 2023.

arXiv:2310.12969 [pdf, other]

Demystifying the Myths and Legends of Nonconvex Convergence of SGD

Authors: Aritra Dutta, El Houcine Bergou, Soumia Boucherouite, Nicklas Werge, Melih Kandemir, Xin Li

Abstract: Stochastic gradient descent (SGD) and its variants are the main workhorses for solving large-scale optimization problems with nonconvex objective functions. Although the convergence of SGDs in the (strongly) convex case is well-understood, their convergence for nonconvex functions stands on weak mathematical foundations. Most existing studies on the nonconvex convergence of SGD show the complexity… ▽ More Stochastic gradient descent (SGD) and its variants are the main workhorses for solving large-scale optimization problems with nonconvex objective functions. Although the convergence of SGDs in the (strongly) convex case is well-understood, their convergence for nonconvex functions stands on weak mathematical foundations. Most existing studies on the nonconvex convergence of SGD show the complexity results based on either the minimum of the expected gradient norm or the functional sub-optimality gap (for functions with extra structural property) by searching the entire range of iterates. Hence the last iterations of SGDs do not necessarily maintain the same complexity guarantee. This paper shows that an $ε$-stationary point exists in the final iterates of SGDs, given a large enough total iteration budget, $T$, not just anywhere in the entire range of iterates -- a much stronger result than the existing one. Additionally, our analyses allow us to measure the density of the $ε$-stationary points in the final iterates of SGD, and we recover the classical $O(\frac{1}{\sqrt{T}})$ asymptotic rate under various existing assumptions on the objective function and the bounds on the stochastic gradient. As a result of our analyses, we addressed certain myths and legends related to the nonconvex convergence of SGD and posed some thought-provoking questions that could set new directions for research. △ Less

Submitted 19 October, 2023; originally announced October 2023.

arXiv:2309.02913 [pdf, ps, other]

Ensemble DNN for Age-of-Information Minimization in UAV-assisted Networks

Authors: Mouhamed Naby Ndiaye, El Houcine Bergou, Hajar El Hammouti

Abstract: This paper addresses the problem of Age-of-Information (AoI) in UAV-assisted networks. Our objective is to minimize the expected AoI across devices by optimizing UAVs' stop** locations and device selection probabilities. To tackle this problem, we first derive a closed-form expression of the expected AoI that involves the probabilities of selection of devices. Then, we formulate the problem as a… ▽ More This paper addresses the problem of Age-of-Information (AoI) in UAV-assisted networks. Our objective is to minimize the expected AoI across devices by optimizing UAVs' stop** locations and device selection probabilities. To tackle this problem, we first derive a closed-form expression of the expected AoI that involves the probabilities of selection of devices. Then, we formulate the problem as a non-convex minimization subject to quality of service constraints. Since the problem is challenging to solve, we propose an Ensemble Deep Neural Network (EDNN) based approach which takes advantage of the dual formulation of the studied problem. Specifically, the Deep Neural Networks (DNNs) in the ensemble are trained in an unsupervised manner using the Lagrangian function of the studied problem. Our experiments show that the proposed EDNN method outperforms traditional DNNs in reducing the expected AoI, achieving a remarkable reduction of $29.5\%$. △ Less

Submitted 6 September, 2023; originally announced September 2023.

Comments: 6 pages, 3 figures

arXiv:2308.16904 [pdf, other]

A Note on Randomized Kaczmarz Algorithm for Solving Doubly-Noisy Linear Systems

Authors: El Houcine Bergou, Soumia Boucherouite, Aritra Dutta, Xin Li, Anna Ma

Abstract: Large-scale linear systems, $Ax=b$, frequently arise in practice and demand effective iterative solvers. Often, these systems are noisy due to operational errors or faulty data-collection processes. In the past decade, the randomized Kaczmarz (RK) algorithm has been studied extensively as an efficient iterative solver for such systems. However, the convergence study of RK in the noisy regime is li… ▽ More Large-scale linear systems, $Ax=b$, frequently arise in practice and demand effective iterative solvers. Often, these systems are noisy due to operational errors or faulty data-collection processes. In the past decade, the randomized Kaczmarz (RK) algorithm has been studied extensively as an efficient iterative solver for such systems. However, the convergence study of RK in the noisy regime is limited and considers measurement noise in the right-hand side vector, $b$. Unfortunately, in practice, that is not always the case; the coefficient matrix $A$ can also be noisy. In this paper, we analyze the convergence of RK for noisy linear systems when the coefficient matrix, $A$, is corrupted with both additive and multiplicative noise, along with the noisy vector, $b$. In our analyses, the quantity $\tilde R=\| \tilde A^{\dagger} \|_2^2 \|\tilde A \|_F^2$ influences the convergence of RK, where $\tilde A$ represents a noisy version of $A$. We claim that our analysis is robust and realistically applicable, as we do not require information about the noiseless coefficient matrix, $A$, and considering different conditions on noise, we can control the convergence of RK. We substantiate our theoretical findings by performing comprehensive numerical experiments. △ Less

Submitted 31 August, 2023; originally announced August 2023.

MSC Class: 15A06; 15A09; 15A10; 15A18; 65F10; 65Y20; 68Q25; 68W20; 68W40

arXiv:2303.08680 [pdf, other]

Muti-Agent Proximal Policy Optimization For Data Freshness in UAV-assisted Networks

Authors: Mouhamed Naby Ndiaye, El Houcine Bergou, Hajar El Hammouti

Abstract: Unmanned aerial vehicles (UAVs) are seen as a promising technology to perform a wide range of tasks in wireless communication networks. In this work, we consider the deployment of a group of UAVs to collect the data generated by IoT devices. Specifically, we focus on the case where the collected data is time-sensitive, and it is critical to maintain its timeliness. Our objective is to optimally de… ▽ More Unmanned aerial vehicles (UAVs) are seen as a promising technology to perform a wide range of tasks in wireless communication networks. In this work, we consider the deployment of a group of UAVs to collect the data generated by IoT devices. Specifically, we focus on the case where the collected data is time-sensitive, and it is critical to maintain its timeliness. Our objective is to optimally design the UAVs' trajectories and the subsets of visited IoT devices such as the global Age-of-Updates (AoU) is minimized. To this end, we formulate the studied problem as a mixed-integer nonlinear programming (MINLP) under time and quality of service constraints. To efficiently solve the resulting optimization problem, we investigate the cooperative Multi-Agent Reinforcement Learning (MARL) framework and propose an RL approach based on the popular on-policy Reinforcement Learning (RL) algorithm: Policy Proximal Optimization (PPO). Our approach leverages the centralized training decentralized execution (CTDE) framework where the UAVs learn their optimal policies while training a centralized value function. Our simulation results show that the proposed MAPPO approach reduces the global AoU by at least a factor of 1/2 compared to conventional off-policy reinforcement learning approaches. △ Less

Submitted 15 March, 2023; originally announced March 2023.

arXiv:2210.08287 [pdf, other]

Linear Scalarization for Byzantine-robust learning on non-IID data

Authors: Latifa Errami, El Houcine Bergou

Abstract: In this work we study the problem of Byzantine-robust learning when data among clients is heterogeneous. We focus on poisoning attacks targeting the convergence of SGD. Although this problem has received great attention; the main Byzantine defenses rely on the IID assumption causing them to fail when data distribution is non-IID even with no attack. We propose the use of Linear Scalarization (LS)… ▽ More In this work we study the problem of Byzantine-robust learning when data among clients is heterogeneous. We focus on poisoning attacks targeting the convergence of SGD. Although this problem has received great attention; the main Byzantine defenses rely on the IID assumption causing them to fail when data distribution is non-IID even with no attack. We propose the use of Linear Scalarization (LS) as an enhancing method to enable current defenses to circumvent Byzantine attacks in the non-IID setting. The LS method is based on the incorporation of a trade-off vector that penalizes the suspected malicious clients. Empirical analysis corroborates that the proposed LS variants are viable in the IID setting. For mild to strong non-IID data splits, LS is either comparable or outperforming current approaches under state-of-the-art Byzantine attack scenarios. △ Less

Submitted 15 October, 2022; originally announced October 2022.

arXiv:2209.07883 [pdf, other]

Minibatch Stochastic Three Points Method for Unconstrained Smooth Minimization

Authors: Soumia Boucherouite, Grigory Malinovsky, Peter Richtárik, EL Houcine Bergou

Abstract: In this paper, we propose a new zero order optimization method called minibatch stochastic three points (MiSTP) method to solve an unconstrained minimization problem in a setting where only an approximation of the objective function evaluation is possible. It is based on the recently proposed stochastic three points (STP) method (Bergou et al., 2020). At each iteration, MiSTP generates a random se… ▽ More In this paper, we propose a new zero order optimization method called minibatch stochastic three points (MiSTP) method to solve an unconstrained minimization problem in a setting where only an approximation of the objective function evaluation is possible. It is based on the recently proposed stochastic three points (STP) method (Bergou et al., 2020). At each iteration, MiSTP generates a random search direction in a similar manner to STP, but chooses the next iterate based solely on the approximation of the objective function rather than its exact evaluations. We also analyze our method's complexity in the nonconvex and convex cases and evaluate its performance on multiple machine learning tasks. △ Less

Submitted 16 September, 2022; originally announced September 2022.

arXiv:2209.05370 [pdf, other]

Age-of-Updates Optimization for UAV-assisted Networks

Authors: Mouhamed Naby Ndiaye, El Houcine Bergou, Mounir Ghogho, Hajar El Hammouti

Abstract: Unmanned aerial vehicles (UAVs) have been proposed as a promising technology to collect data from IoT devices and relay it to the network. In this work, we are interested in scenarios where the data is updated periodically, and the collected updates are time-sensitive. In particular, the data updates may lose their value if they are not collected and analyzed timely. To maximize the data freshness… ▽ More Unmanned aerial vehicles (UAVs) have been proposed as a promising technology to collect data from IoT devices and relay it to the network. In this work, we are interested in scenarios where the data is updated periodically, and the collected updates are time-sensitive. In particular, the data updates may lose their value if they are not collected and analyzed timely. To maximize the data freshness, we optimize a new performance metric, namely the Age-of-Updates (AoU). Our objective is to carefully schedule the UAVs hovering positions and the users' association so that the AoU is minimized. Unlike existing works where the association parameters are considered as binary variables, we assume that devices send their updates according to a probability distribution. As a consequence, instead of optimizing a deterministic objective function, the objective function is replaced by an expectation over the probability distribution. The expected AoU is therefore optimized under quality of service and energy constraints. The original problem being non-convex, we propose an equivalent convex optimization that we solve using an interior-point method. Our simulation results show the performance of the proposed approach against a binary association. △ Less

Submitted 12 September, 2022; originally announced September 2022.

Comments: 6 pages, 4 figures, IEE GLOBECOM 2022

MSC Class: 90C30

arXiv:2209.05148 [pdf, other]

Personalized Federated Learning with Communication Compression

Authors: El Houcine Bergou, Konstantin Burlachenko, Aritra Dutta, Peter Richtárik

Abstract: In contrast to training traditional machine learning (ML) models in data centers, federated learning (FL) trains ML models over local datasets contained on resource-constrained heterogeneous edge devices. Existing FL algorithms aim to learn a single global model for all participating devices, which may not be helpful to all devices participating in the training due to the heterogeneity of the data… ▽ More In contrast to training traditional machine learning (ML) models in data centers, federated learning (FL) trains ML models over local datasets contained on resource-constrained heterogeneous edge devices. Existing FL algorithms aim to learn a single global model for all participating devices, which may not be helpful to all devices participating in the training due to the heterogeneity of the data across the devices. Recently, Hanzely and Richtárik (2020) proposed a new formulation for training personalized FL models aimed at balancing the trade-off between the traditional global model and the local models that could be trained by individual devices using their private data only. They derived a new algorithm, called Loopless Gradient Descent (L2GD), to solve it and showed that this algorithms leads to improved communication complexity guarantees in regimes when more personalization is required. In this paper, we equip their L2GD algorithm with a bidirectional compression mechanism to further reduce the communication bottleneck between the local devices and the server. Unlike other compression-based algorithms used in the FL-setting, our compressed L2GD algorithm operates on a probabilistic communication protocol, where communication does not happen on a fixed schedule. Moreover, our compressed L2GD algorithm maintains a similar convergence rate as vanilla SGD without compression. To empirically validate the efficiency of our algorithm, we perform diverse numerical experiments on both convex and non-convex problems and using various compression techniques. △ Less

Submitted 12 September, 2022; originally announced September 2022.

Comments: 19 pages, 11 figure, federate learning

ACM Class: G.1.6; G.1.m

arXiv:2111.11204 [pdf, other]

Client Selection in Federated Learning based on Gradients Importance

Authors: Ouiame Marnissi, Hajar El Hammouti, El Houcine Bergou

Abstract: Federated learning (FL) enables multiple devices to collaboratively learn a global model without sharing their personal data. In real-world applications, the different parties are likely to have heterogeneous data distribution and limited communication bandwidth. In this paper, we are interested in improving the communication efficiency of FL systems. We investigate and design a device selection s… ▽ More Federated learning (FL) enables multiple devices to collaboratively learn a global model without sharing their personal data. In real-world applications, the different parties are likely to have heterogeneous data distribution and limited communication bandwidth. In this paper, we are interested in improving the communication efficiency of FL systems. We investigate and design a device selection strategy based on the importance of the gradient norms. In particular, our approach consists of selecting devices with the highest norms of gradient values at each communication round. We study the convergence and the performance of such a selection technique and compare it to existing ones. We perform several experiments with non-iid set-up. The results show the convergence of our method with a considerable increase of test accuracy comparing to the random selection. △ Less

Submitted 19 November, 2021; originally announced November 2021.

Comments: Submitted to ICC'2022

arXiv:1911.08250 [pdf, other]

On the Discrepancy between the Theoretical Analysis and Practical Implementations of Compressed Communication for Distributed Deep Learning

Authors: Aritra Dutta, El Houcine Bergou, Ahmed M. Abdelmoniem, Chen-Yu Ho, Atal Narayan Sahu, Marco Canini, Panos Kalnis

Abstract: Compressed communication, in the form of sparsification or quantization of stochastic gradients, is employed to reduce communication costs in distributed data-parallel training of deep neural networks. However, there exists a discrepancy between theory and practice: while theoretical analysis of most existing compression methods assumes compression is applied to the gradients of the entire model,… ▽ More Compressed communication, in the form of sparsification or quantization of stochastic gradients, is employed to reduce communication costs in distributed data-parallel training of deep neural networks. However, there exists a discrepancy between theory and practice: while theoretical analysis of most existing compression methods assumes compression is applied to the gradients of the entire model, many practical implementations operate individually on the gradients of each layer of the model. In this paper, we prove that layer-wise compression is, in theory, better, because the convergence rate is upper bounded by that of entire-model compression for a wide range of biased and unbiased compression methods. However, despite the theoretical bound, our experimental study of six well-known methods shows that convergence, in practice, may or may not be better, depending on the actual trained model and compression ratio. Our findings suggest that it would be advantageous for deep learning frameworks to include support for both layer-wise and entire-model compression. △ Less

Submitted 19 November, 2019; originally announced November 2019.

Comments: To Appear In Proceedings of Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

Journal ref: In Proceedings of Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

arXiv:1905.13278 [pdf, other]

A Stochastic Derivative Free Optimization Method with Momentum

Authors: Eduard Gorbunov, Adel Bibi, Ozan Sener, El Houcine Bergou, Peter Richtárik

Abstract: We consider the problem of unconstrained minimization of a smooth objective function in $\mathbb{R}^d$ in setting where only function evaluations are possible. We propose and analyze stochastic zeroth-order method with heavy ball momentum. In particular, we propose, SMTP, a momentum version of the stochastic three-point method (STP) \cite{Bergou_2018}. We show new complexity results for non-convex… ▽ More We consider the problem of unconstrained minimization of a smooth objective function in $\mathbb{R}^d$ in setting where only function evaluations are possible. We propose and analyze stochastic zeroth-order method with heavy ball momentum. In particular, we propose, SMTP, a momentum version of the stochastic three-point method (STP) \cite{Bergou_2018}. We show new complexity results for non-convex, convex and strongly convex functions. We test our method on a collection of learning to continuous control tasks on several MuJoCo \cite{Todorov_2012} environments with varying difficulty and compare against STP, other state-of-the-art derivative-free optimization algorithms and against policy gradient methods. SMTP significantly outperforms STP and all other methods that we considered in our numerical experiments. Our second contribution is SMTP with importance sampling which we call SMTP_IS. We provide convergence analysis of this method for non-convex, convex and strongly convex objectives. △ Less

Submitted 16 February, 2020; v1 submitted 30 May, 2019; originally announced May 2019.

arXiv:1905.11692 [pdf, other]

Direct Nonlinear Acceleration

Authors: Aritra Dutta, El Houcine Bergou, Yunming Xiao, Marco Canini, Peter Richtárik

Abstract: Optimization acceleration techniques such as momentum play a key role in state-of-the-art machine learning algorithms. Recently, generic vector sequence extrapolation techniques, such as regularized nonlinear acceleration (RNA) of Scieur et al., were proposed and shown to accelerate fixed point iterations. In contrast to RNA which computes extrapolation coefficients by (approximately) setting the… ▽ More Optimization acceleration techniques such as momentum play a key role in state-of-the-art machine learning algorithms. Recently, generic vector sequence extrapolation techniques, such as regularized nonlinear acceleration (RNA) of Scieur et al., were proposed and shown to accelerate fixed point iterations. In contrast to RNA which computes extrapolation coefficients by (approximately) setting the gradient of the objective function to zero at the extrapolated point, we propose a more direct approach, which we call direct nonlinear acceleration (DNA). In DNA, we aim to minimize (an approximation of) the function value at the extrapolated point instead. We adopt a regularized approach with regularizers designed to prevent the model from entering a region in which the functional approximation is less precise. While the computational cost of DNA is comparable to that of RNA, our direct approach significantly outperforms RNA on both synthetic and real-world datasets. While the focus of this paper is on convex problems, we obtain very encouraging results in accelerating the training of neural networks. △ Less

Submitted 28 May, 2019; originally announced May 2019.

MSC Class: 65D15; 65B05; 46N40; 65F99; 68W99

arXiv:1902.03591 [pdf, other]

Stochastic Three Points Method for Unconstrained Smooth Minimization

Authors: El Houcine Bergou, Eduard Gorbunov, Peter Richtárik

Abstract: In this paper we consider the unconstrained minimization problem of a smooth function in ${\mathbb{R}}^n$ in a setting where only function evaluations are possible. We design a novel randomized derivative-free algorithm --- the stochastic three points (STP) method --- and analyze its iteration complexity. At each iteration, STP generates a random search direction according to a certain fixed proba… ▽ More In this paper we consider the unconstrained minimization problem of a smooth function in ${\mathbb{R}}^n$ in a setting where only function evaluations are possible. We design a novel randomized derivative-free algorithm --- the stochastic three points (STP) method --- and analyze its iteration complexity. At each iteration, STP generates a random search direction according to a certain fixed probability law. Our assumptions on this law are very mild: roughly speaking, all laws which do not concentrate all measure on any halfspace passing through the origin will work. For instance, we allow for the uniform distribution on the sphere and also distributions that concentrate all measure on a positive spanning set. Given a current iterate $x$, STP compares the objective function at three points: $x$, $x+αs$ and $x-αs$, where $α>0$ is a stepsize parameter and $s$ is the random search direction. The best of these three points is the next iterate. We analyze the method STP under several stepsize selection schemes (fixed, decreasing, estimated through finite differences, etc). We study non-convex, convex and strongly convex cases. △ Less

Submitted 7 May, 2019; v1 submitted 10 February, 2019; originally announced February 2019.

arXiv:1902.01272 [pdf, other]

A Stochastic Derivative-Free Optimization Method with Importance Sampling: Theory and Learning to Control

Authors: Adel Bibi, El Houcine Bergou, Ozan Sener, Bernard Ghanem, Peter Richtárik

Abstract: We consider the problem of unconstrained minimization of a smooth objective function in $\R^n$ in a setting where only function evaluations are possible. While importance sampling is one of the most popular techniques used by machine learning practitioners to accelerate the convergence of their models when applicable, there is not much existing theory for this acceleration in the derivative-free s… ▽ More We consider the problem of unconstrained minimization of a smooth objective function in $\R^n$ in a setting where only function evaluations are possible. While importance sampling is one of the most popular techniques used by machine learning practitioners to accelerate the convergence of their models when applicable, there is not much existing theory for this acceleration in the derivative-free setting. In this paper, we propose the first derivative free optimization method with importance sampling and derive new improved complexity results on non-convex, convex and strongly convex functions. We conduct extensive experiments on various synthetic and real LIBSVM datasets confirming our theoretical results. We further test our method on a collection of continuous control tasks on MuJoCo environments with varying difficulty. Experiments suggest that our algorithm is practical for high dimensional continuous control problems where importance sampling results in a significant sample complexity improvement. △ Less

Submitted 2 April, 2020; v1 submitted 4 February, 2019; originally announced February 2019.

arXiv:1805.11588 [pdf, ps, other]

A Line-Search Algorithm Inspired by the Adaptive Cubic Regularization Framework and Complexity Analysis

Authors: El houcine Bergou, Youssef Diouane, Serge Gratton

Abstract: Adaptive regularized framework using cubics has emerged as an alternative to line-search and trust-region algorithms for smooth nonconvex optimization, with an optimal complexity amongst second-order methods. In this paper, we propose and analyze the use of an iteration dependent scaled norm in the adaptive regularized framework using cubics. Within such scaled norm, the obtained method behaves as… ▽ More Adaptive regularized framework using cubics has emerged as an alternative to line-search and trust-region algorithms for smooth nonconvex optimization, with an optimal complexity amongst second-order methods. In this paper, we propose and analyze the use of an iteration dependent scaled norm in the adaptive regularized framework using cubics. Within such scaled norm, the obtained method behaves as a line-search algorithm along the quasi-Newton direction with a special backtracking strategy. Under appropriate assumptions, the new algorithm enjoys the same convergence and complexity properties as adaptive regularized algorithm using cubics. The complexity for finding an approximate first-order stationary point can be improved to be optimal whenever a second order version of the proposed algorithm is regarded. In a similar way, using the same scaled norm to define the trust-region neighborhood, we show that the trust-region algorithm behaves as a line-search algorithm. The good potential of the obtained algorithms is shown on a set of large scale optimization problems. △ Less

Submitted 29 May, 2018; originally announced May 2018.

MSC Class: 49M05; 49M15; 90C06; 90C60

arXiv:1411.4608 [pdf, ps, other]

doi 10.1016/j.apnum.2018.11.008

On the Convergence of a Non-linear Ensemble Kalman Smoother

Authors: El houcine Bergou, Serge Gratton, Jan Mandel

Abstract: Ensemble methods, such as the ensemble Kalman filter (EnKF), the local ensemble transform Kalman filter (LETKF), and the ensemble Kalman smoother (EnKS) are widely used in sequential data assimilation, where state vectors are of huge dimension. Little is known, however, about the asymptotic behavior of ensemble methods. In this paper, we prove convergence in L^p of ensemble Kalman smoother to the… ▽ More Ensemble methods, such as the ensemble Kalman filter (EnKF), the local ensemble transform Kalman filter (LETKF), and the ensemble Kalman smoother (EnKS) are widely used in sequential data assimilation, where state vectors are of huge dimension. Little is known, however, about the asymptotic behavior of ensemble methods. In this paper, we prove convergence in L^p of ensemble Kalman smoother to the Kalman smoother in the large-ensemble limit, as well as the convergence of EnKS-4DVAR, which is a Levenberg-Marquardt-like algorithm with EnKS as the linear solver, to the classical Levenberg-Marquardt algorithm in which the linearized problem is solved exactly. △ Less

Submitted 17 November, 2014; originally announced November 2014.

Journal ref: Applied Numerical Mathematics 137 (2019) 151-168

Showing 1–19 of 19 results for author: Bergou, E H