-
Accelerating Distributed Stochastic Optimization via Self-Repellent Random Walks
Authors:
Jie Hu,
Vishwaraj Doshi,
Do Young Eun
Abstract:
We study a family of distributed stochastic optimization algorithms where gradients are sampled by a token traversing a network of agents in random-walk fashion. Typically, these random-walks are chosen to be Markov chains that asymptotically sample from a desired target distribution, and play a critical role in the convergence of the optimization iterates. In this paper, we take a novel approach…
▽ More
We study a family of distributed stochastic optimization algorithms where gradients are sampled by a token traversing a network of agents in random-walk fashion. Typically, these random-walks are chosen to be Markov chains that asymptotically sample from a desired target distribution, and play a critical role in the convergence of the optimization iterates. In this paper, we take a novel approach by replacing the standard linear Markovian token by one which follows a nonlinear Markov chain - namely the Self-Repellent Radom Walk (SRRW). Defined for any given 'base' Markov chain, the SRRW, parameterized by a positive scalar α, is less likely to transition to states that were highly visited in the past, thus the name. In the context of MCMC sampling on a graph, a recent breakthrough in Doshi et al. (2023) shows that the SRRW achieves O(1/α) decrease in the asymptotic variance for sampling. We propose the use of a 'generalized' version of the SRRW to drive token algorithms for distributed stochastic optimization in the form of stochastic approximation, termed SA-SRRW. We prove that the optimization iterate errors of the resulting SA-SRRW converge to zero almost surely and prove a central limit theorem, deriving the explicit form of the resulting asymptotic covariance matrix corresponding to iterate errors. This asymptotic covariance is always smaller than that of an algorithm driven by the base Markov chain and decreases at rate O(1/α^2) - the performance benefit of using SRRW thereby amplified in the stochastic optimization context. Empirical results support our theoretical findings.
△ Less
Submitted 17 January, 2024;
originally announced January 2024.
-
Central Limit Theorem for Two-Timescale Stochastic Approximation with Markovian Noise: Theory and Applications
Authors:
Jie Hu,
Vishwaraj Doshi,
Do Young Eun
Abstract:
Two-timescale stochastic approximation (TTSA) is among the most general frameworks for iterative stochastic algorithms. This includes well-known stochastic optimization methods such as SGD variants and those designed for bilevel or minimax problems, as well as reinforcement learning like the family of gradient-based temporal difference (GTD) algorithms. In this paper, we conduct an in-depth asympt…
▽ More
Two-timescale stochastic approximation (TTSA) is among the most general frameworks for iterative stochastic algorithms. This includes well-known stochastic optimization methods such as SGD variants and those designed for bilevel or minimax problems, as well as reinforcement learning like the family of gradient-based temporal difference (GTD) algorithms. In this paper, we conduct an in-depth asymptotic analysis of TTSA under controlled Markovian noise via central limit theorem (CLT), uncovering the coupled dynamics of TTSA influenced by the underlying Markov chain, which has not been addressed by previous CLT results of TTSA only with Martingale difference noise. Building upon our CLT, we expand its application horizon of efficient sampling strategies from vanilla SGD to a wider TTSA context in distributed learning, thus broadening the scope of Hu et al. (2022). In addition, we leverage our CLT result to deduce the statistical properties of GTD algorithms with nonlinear function approximation using Markovian samples and show their identical asymptotic performance, a perspective not evident from current finite-time bounds.
△ Less
Submitted 13 February, 2024; v1 submitted 17 January, 2024;
originally announced January 2024.
-
Self-Repellent Random Walks on General Graphs -- Achieving Minimal Sampling Variance via Nonlinear Markov Chains
Authors:
Vishwaraj Doshi,
Jie Hu,
Do Young Eun
Abstract:
We consider random walks on discrete state spaces, such as general undirected graphs, where the random walkers are designed to approximate a target quantity over the network topology via sampling and neighborhood exploration in the form of Markov chain Monte Carlo (MCMC) procedures. Given any Markov chain corresponding to a target probability distribution, we design a self-repellent random walk (S…
▽ More
We consider random walks on discrete state spaces, such as general undirected graphs, where the random walkers are designed to approximate a target quantity over the network topology via sampling and neighborhood exploration in the form of Markov chain Monte Carlo (MCMC) procedures. Given any Markov chain corresponding to a target probability distribution, we design a self-repellent random walk (SRRW) which is less likely to transition to nodes that were highly visited in the past, and more likely to transition to seldom visited nodes. For a class of SRRWs parameterized by a positive real α, we prove that the empirical distribution of the process converges almost surely to the the target (stationary) distribution of the underlying Markov chain kernel. We then provide a central limit theorem and derive the exact form of the arising asymptotic co-variance matrix, which allows us to show that the SRRW with a stronger repellence (larger α) always achieves a smaller asymptotic covariance, in the sense of Loewner ordering of co-variance matrices. Especially for SRRW-driven MCMC algorithms, we show that the decrease in the asymptotic sampling variance is of the order O(1/α), eventually going down to zero. Finally, we provide numerical simulations complimentary to our theoretical results, also empirically testing a version of SRRW with α increasing in time to combine the benefits of smaller asymptotic variance due to large α, with empirically observed faster mixing properties of SRRW with smaller α.
△ Less
Submitted 28 January, 2024; v1 submitted 8 May, 2023;
originally announced May 2023.
-
Minimizing File Transfer Time in Opportunistic Spectrum Access Model
Authors:
Jie Hu,
Vishwaraj Doshi,
Do Young Eun
Abstract:
We study the file transfer problem in opportunistic spectrum access (OSA) model, which has been widely studied in throughput-oriented applications for max-throughput strategies and in delay-related works that commonly assume identical channel rates and fixed file sizes. Our work explicitly considers minimizing the file transfer time for a given file in a set of heterogeneous-rate Bernoulli channel…
▽ More
We study the file transfer problem in opportunistic spectrum access (OSA) model, which has been widely studied in throughput-oriented applications for max-throughput strategies and in delay-related works that commonly assume identical channel rates and fixed file sizes. Our work explicitly considers minimizing the file transfer time for a given file in a set of heterogeneous-rate Bernoulli channels, showing that max-throughput policy doesn't minimize file transfer time in general. We formulate a mathematical framework for static extend to dynamic policies by map** our file transfer problem to a stochastic shortest path problem. We analyze the performance of our proposed static and dynamic optimal policies over the max-throughput policy. We propose a mixed-integer programming formulation as an efficient alternative way to obtain the dynamic optimal policy and show a huge reduction in computation time. Then, we propose a heuristic policy that takes into account the performance-complexity tradeoff and consider the online implementation with unknown channel parameters. Furthermore, we present numerical simulations to support our analytical results and discuss the effect of switching delay on different policies. Finally, we extend the file transfer problem to Markovian channels and demonstrate the impact of the correlation of each channel.
△ Less
Submitted 5 October, 2022;
originally announced October 2022.
-
Opportunistic Spectrum Access: Does Maximizing Throughput Minimize File Transfer Time?
Authors:
Jie Hu,
Vishwaraj Doshi,
Do Young Eun
Abstract:
The Opportunistic Spectrum Access (OSA) model has been developed for the secondary users (SUs) to exploit the stochastic dynamics of licensed channels for file transfer in an opportunistic manner. Common approaches to design channel sensing strategies for throughput-oriented applications tend to maximize the long-term throughput, with the hope that it provides reduced file transfer time as well. I…
▽ More
The Opportunistic Spectrum Access (OSA) model has been developed for the secondary users (SUs) to exploit the stochastic dynamics of licensed channels for file transfer in an opportunistic manner. Common approaches to design channel sensing strategies for throughput-oriented applications tend to maximize the long-term throughput, with the hope that it provides reduced file transfer time as well. In this paper, we show that this is not correct in general, especially for small files. Unlike prior delay-related works that seldom consider the heterogeneous channel rate and bursty incoming packets, our work explicitly considers minimizing the file transfer time of a single file consisting of multiple packets in a set of heterogeneous channels. We formulate a mathematical framework for the static policy, and extend to dynamic policy by map** our file transfer problem to the stochastic shortest path problem. We analyze the performance of our proposed static optimal and dynamic optimal policies over the policy that maximizes long-term throughput. We then propose a heuristic policy that takes into account the performance-complexity tradeoff and an extension to online implementation with unknown channel parameters, and also present the regret bound for our online algorithm. We also present numerical simulations that reflect our analytical results.
△ Less
Submitted 26 September, 2021; v1 submitted 23 September, 2021;
originally announced September 2021.
-
Fiedler Vector Approximation via Interacting Random Walks
Authors:
Vishwaraj Doshi,
Do Young Eun
Abstract:
The Fiedler vector of a graph, namely the eigenvector corresponding to the second smallest eigenvalue of a graph Laplacian matrix, plays an important role in spectral graph theory with applications in problems such as graph bi-partitioning and envelope reduction. Algorithms designed to estimate this quantity usually rely on a priori knowledge of the entire graph, and employ techniques such as grap…
▽ More
The Fiedler vector of a graph, namely the eigenvector corresponding to the second smallest eigenvalue of a graph Laplacian matrix, plays an important role in spectral graph theory with applications in problems such as graph bi-partitioning and envelope reduction. Algorithms designed to estimate this quantity usually rely on a priori knowledge of the entire graph, and employ techniques such as graph sparsification and power iterations, which have obvious shortcomings in cases where the graph is unknown, or changing dynamically. In this paper, we develop a framework in which we construct a stochastic process based on a set of interacting random walks on a graph and show that a suitably scaled version of our stochastic process converges to the Fiedler vector for a sufficiently large number of walks. Like other techniques based on exploratory random walks and on-the-fly computations, such as Markov Chain Monte Carlo (MCMC), our algorithm overcomes challenges typically faced by power iteration based approaches. But, unlike any existing random walk based method such as MCMCs where the focus is on the leading eigenvector, our framework with interacting random walks converges to the Fiedler vector (second eigenvector). We also provide numerical results to confirm our theoretical findings on different graphs, and show that our algorithm performs well over a wide range of parameters and the number of random walks. Simulations results over time varying dynamic graphs are also provided to show the efficacy of our random walk based technique in such settings. As an important contribution, we extend our results and show that our framework is applicable for approximating not just the Fiedler vector of graph Laplacians, but also the second eigenvector of any time reversible Markov Chain kernel via interacting random walks.
△ Less
Submitted 1 February, 2020;
originally announced February 2020.