Search | arXiv e-print repository

Accelerating Distributed Stochastic Optimization via Self-Repellent Random Walks

Authors: Jie Hu, Vishwaraj Doshi, Do Young Eun

Abstract: We study a family of distributed stochastic optimization algorithms where gradients are sampled by a token traversing a network of agents in random-walk fashion. Typically, these random-walks are chosen to be Markov chains that asymptotically sample from a desired target distribution, and play a critical role in the convergence of the optimization iterates. In this paper, we take a novel approach… ▽ More We study a family of distributed stochastic optimization algorithms where gradients are sampled by a token traversing a network of agents in random-walk fashion. Typically, these random-walks are chosen to be Markov chains that asymptotically sample from a desired target distribution, and play a critical role in the convergence of the optimization iterates. In this paper, we take a novel approach by replacing the standard linear Markovian token by one which follows a nonlinear Markov chain - namely the Self-Repellent Radom Walk (SRRW). Defined for any given 'base' Markov chain, the SRRW, parameterized by a positive scalar α, is less likely to transition to states that were highly visited in the past, thus the name. In the context of MCMC sampling on a graph, a recent breakthrough in Doshi et al. (2023) shows that the SRRW achieves O(1/α) decrease in the asymptotic variance for sampling. We propose the use of a 'generalized' version of the SRRW to drive token algorithms for distributed stochastic optimization in the form of stochastic approximation, termed SA-SRRW. We prove that the optimization iterate errors of the resulting SA-SRRW converge to zero almost surely and prove a central limit theorem, deriving the explicit form of the resulting asymptotic covariance matrix corresponding to iterate errors. This asymptotic covariance is always smaller than that of an algorithm driven by the base Markov chain and decreases at rate O(1/α^2) - the performance benefit of using SRRW thereby amplified in the stochastic optimization context. Empirical results support our theoretical findings. △ Less

Submitted 17 January, 2024; originally announced January 2024.

Comments: Accepted for oral presentation at the Twelfth International Conference on Learning Representations (ICLR 2024)

arXiv:2401.09339 [pdf, other]

Central Limit Theorem for Two-Timescale Stochastic Approximation with Markovian Noise: Theory and Applications

Authors: Jie Hu, Vishwaraj Doshi, Do Young Eun

Abstract: Two-timescale stochastic approximation (TTSA) is among the most general frameworks for iterative stochastic algorithms. This includes well-known stochastic optimization methods such as SGD variants and those designed for bilevel or minimax problems, as well as reinforcement learning like the family of gradient-based temporal difference (GTD) algorithms. In this paper, we conduct an in-depth asympt… ▽ More Two-timescale stochastic approximation (TTSA) is among the most general frameworks for iterative stochastic algorithms. This includes well-known stochastic optimization methods such as SGD variants and those designed for bilevel or minimax problems, as well as reinforcement learning like the family of gradient-based temporal difference (GTD) algorithms. In this paper, we conduct an in-depth asymptotic analysis of TTSA under controlled Markovian noise via central limit theorem (CLT), uncovering the coupled dynamics of TTSA influenced by the underlying Markov chain, which has not been addressed by previous CLT results of TTSA only with Martingale difference noise. Building upon our CLT, we expand its application horizon of efficient sampling strategies from vanilla SGD to a wider TTSA context in distributed learning, thus broadening the scope of Hu et al. (2022). In addition, we leverage our CLT result to deduce the statistical properties of GTD algorithms with nonlinear function approximation using Markovian samples and show their identical asymptotic performance, a perspective not evident from current finite-time bounds. △ Less

Submitted 13 February, 2024; v1 submitted 17 January, 2024; originally announced January 2024.

Comments: To appear in AISTATS 2024

arXiv:2305.05097 [pdf, other]

Self-Repellent Random Walks on General Graphs -- Achieving Minimal Sampling Variance via Nonlinear Markov Chains

Authors: Vishwaraj Doshi, Jie Hu, Do Young Eun

Abstract: We consider random walks on discrete state spaces, such as general undirected graphs, where the random walkers are designed to approximate a target quantity over the network topology via sampling and neighborhood exploration in the form of Markov chain Monte Carlo (MCMC) procedures. Given any Markov chain corresponding to a target probability distribution, we design a self-repellent random walk (S… ▽ More We consider random walks on discrete state spaces, such as general undirected graphs, where the random walkers are designed to approximate a target quantity over the network topology via sampling and neighborhood exploration in the form of Markov chain Monte Carlo (MCMC) procedures. Given any Markov chain corresponding to a target probability distribution, we design a self-repellent random walk (SRRW) which is less likely to transition to nodes that were highly visited in the past, and more likely to transition to seldom visited nodes. For a class of SRRWs parameterized by a positive real α, we prove that the empirical distribution of the process converges almost surely to the the target (stationary) distribution of the underlying Markov chain kernel. We then provide a central limit theorem and derive the exact form of the arising asymptotic co-variance matrix, which allows us to show that the SRRW with a stronger repellence (larger α) always achieves a smaller asymptotic covariance, in the sense of Loewner ordering of co-variance matrices. Especially for SRRW-driven MCMC algorithms, we show that the decrease in the asymptotic sampling variance is of the order O(1/α), eventually going down to zero. Finally, we provide numerical simulations complimentary to our theoretical results, also empirically testing a version of SRRW with α increasing in time to combine the benefits of smaller asymptotic variance due to large α, with empirically observed faster mixing properties of SRRW with smaller α. △ Less

Submitted 28 January, 2024; v1 submitted 8 May, 2023; originally announced May 2023.

Comments: Selected for oral presentation at ICML 2023. Recipient of Outstanding Paper award

arXiv:2210.02557 [pdf, other]

doi 10.1109/TMC.2022.3212926

Minimizing File Transfer Time in Opportunistic Spectrum Access Model

Authors: Jie Hu, Vishwaraj Doshi, Do Young Eun

Abstract: We study the file transfer problem in opportunistic spectrum access (OSA) model, which has been widely studied in throughput-oriented applications for max-throughput strategies and in delay-related works that commonly assume identical channel rates and fixed file sizes. Our work explicitly considers minimizing the file transfer time for a given file in a set of heterogeneous-rate Bernoulli channel… ▽ More We study the file transfer problem in opportunistic spectrum access (OSA) model, which has been widely studied in throughput-oriented applications for max-throughput strategies and in delay-related works that commonly assume identical channel rates and fixed file sizes. Our work explicitly considers minimizing the file transfer time for a given file in a set of heterogeneous-rate Bernoulli channels, showing that max-throughput policy doesn't minimize file transfer time in general. We formulate a mathematical framework for static extend to dynamic policies by map** our file transfer problem to a stochastic shortest path problem. We analyze the performance of our proposed static and dynamic optimal policies over the max-throughput policy. We propose a mixed-integer programming formulation as an efficient alternative way to obtain the dynamic optimal policy and show a huge reduction in computation time. Then, we propose a heuristic policy that takes into account the performance-complexity tradeoff and consider the online implementation with unknown channel parameters. Furthermore, we present numerical simulations to support our analytical results and discuss the effect of switching delay on different policies. Finally, we extend the file transfer problem to Markovian channels and demonstrate the impact of the correlation of each channel. △ Less

Submitted 5 October, 2022; originally announced October 2022.

Comments: To appear in IEEE Transactions on Mobile Computing. arXiv admin note: substantial text overlap with arXiv:2109.11624

arXiv:2109.11624 [pdf, ps, other]

doi 10.23919/WiOpt52861.2021.9589243

Opportunistic Spectrum Access: Does Maximizing Throughput Minimize File Transfer Time?

Authors: Jie Hu, Vishwaraj Doshi, Do Young Eun

Abstract: The Opportunistic Spectrum Access (OSA) model has been developed for the secondary users (SUs) to exploit the stochastic dynamics of licensed channels for file transfer in an opportunistic manner. Common approaches to design channel sensing strategies for throughput-oriented applications tend to maximize the long-term throughput, with the hope that it provides reduced file transfer time as well. I… ▽ More The Opportunistic Spectrum Access (OSA) model has been developed for the secondary users (SUs) to exploit the stochastic dynamics of licensed channels for file transfer in an opportunistic manner. Common approaches to design channel sensing strategies for throughput-oriented applications tend to maximize the long-term throughput, with the hope that it provides reduced file transfer time as well. In this paper, we show that this is not correct in general, especially for small files. Unlike prior delay-related works that seldom consider the heterogeneous channel rate and bursty incoming packets, our work explicitly considers minimizing the file transfer time of a single file consisting of multiple packets in a set of heterogeneous channels. We formulate a mathematical framework for the static policy, and extend to dynamic policy by map** our file transfer problem to the stochastic shortest path problem. We analyze the performance of our proposed static optimal and dynamic optimal policies over the policy that maximizes long-term throughput. We then propose a heuristic policy that takes into account the performance-complexity tradeoff and an extension to online implementation with unknown channel parameters, and also present the regret bound for our online algorithm. We also present numerical simulations that reflect our analytical results. △ Less

Submitted 26 September, 2021; v1 submitted 23 September, 2021; originally announced September 2021.

Comments: The shorter version appears in WiOpt 2021

arXiv:2002.00283 [pdf, other]

doi 10.1145/3379502

Fiedler Vector Approximation via Interacting Random Walks

Authors: Vishwaraj Doshi, Do Young Eun

Abstract: The Fiedler vector of a graph, namely the eigenvector corresponding to the second smallest eigenvalue of a graph Laplacian matrix, plays an important role in spectral graph theory with applications in problems such as graph bi-partitioning and envelope reduction. Algorithms designed to estimate this quantity usually rely on a priori knowledge of the entire graph, and employ techniques such as grap… ▽ More The Fiedler vector of a graph, namely the eigenvector corresponding to the second smallest eigenvalue of a graph Laplacian matrix, plays an important role in spectral graph theory with applications in problems such as graph bi-partitioning and envelope reduction. Algorithms designed to estimate this quantity usually rely on a priori knowledge of the entire graph, and employ techniques such as graph sparsification and power iterations, which have obvious shortcomings in cases where the graph is unknown, or changing dynamically. In this paper, we develop a framework in which we construct a stochastic process based on a set of interacting random walks on a graph and show that a suitably scaled version of our stochastic process converges to the Fiedler vector for a sufficiently large number of walks. Like other techniques based on exploratory random walks and on-the-fly computations, such as Markov Chain Monte Carlo (MCMC), our algorithm overcomes challenges typically faced by power iteration based approaches. But, unlike any existing random walk based method such as MCMCs where the focus is on the leading eigenvector, our framework with interacting random walks converges to the Fiedler vector (second eigenvector). We also provide numerical results to confirm our theoretical findings on different graphs, and show that our algorithm performs well over a wide range of parameters and the number of random walks. Simulations results over time varying dynamic graphs are also provided to show the efficacy of our random walk based technique in such settings. As an important contribution, we extend our results and show that our framework is applicable for approximating not just the Fiedler vector of graph Laplacians, but also the second eigenvector of any time reversible Markov Chain kernel via interacting random walks. △ Less

Submitted 1 February, 2020; originally announced February 2020.

Comments: in ACM SIGMETRICS, Boston, MA, June 2020, to appear. (Also will be in Proc. ACM Meas. Anal. Comput. Syst (POMACS), March 2020)

Showing 1–6 of 6 results for author: Doshi, V