-
Communication-Constrained Bandits under Additive Gaussian Noise
Authors:
Prathamesh Mayekar,
Jonathan Scarlett,
Vincent Y. F. Tan
Abstract:
We study a distributed stochastic multi-armed bandit where a client supplies the learner with communication-constrained feedback based on the rewards for the corresponding arm pulls. In our setup, the client must encode the rewards such that the second moment of the encoded rewards is no more than $P$, and this encoded reward is further corrupted by additive Gaussian noise of variance $σ^2$; the l…
▽ More
We study a distributed stochastic multi-armed bandit where a client supplies the learner with communication-constrained feedback based on the rewards for the corresponding arm pulls. In our setup, the client must encode the rewards such that the second moment of the encoded rewards is no more than $P$, and this encoded reward is further corrupted by additive Gaussian noise of variance $σ^2$; the learner only has access to this corrupted reward. For this setting, we derive an information-theoretic lower bound of $Ω\left(\sqrt{\frac{KT}{\mathtt{SNR} \wedge1}} \right)$ on the minimax regret of any scheme, where $ \mathtt{SNR} := \frac{P}{σ^2}$, and $K$ and $T$ are the number of arms and time horizon, respectively. Furthermore, we propose a multi-phase bandit algorithm, $\mathtt{UE\text{-}UCB++}$, which matches this lower bound to a minor additive factor. $\mathtt{UE\text{-}UCB++}$ performs uniform exploration in its initial phases and then utilizes the {\em upper confidence bound }(UCB) bandit algorithm in its final phase. An interesting feature of $\mathtt{UE\text{-}UCB++}$ is that the coarser estimates of the mean rewards formed during a uniform exploration phase help to refine the encoding protocol in the next phase, leading to more accurate mean estimates of the rewards in the subsequent phase. This positive reinforcement cycle is critical to reducing the number of uniform exploration rounds and closely matching our lower bound.
△ Less
Submitted 6 June, 2023; v1 submitted 25 April, 2023;
originally announced April 2023.
-
Compression for Distributed Optimization and Timely Updates
Authors:
Prathamesh Mayekar
Abstract:
The goal of this thesis is to study the compression problems arising in distributed computing systematically. In the first part of the thesis, we study gradient compression for distributed first-order optimization. We begin by establishing information theoretic lower bounds on optimization accuracy when only finite precision gradients are used. Also, we develop fast quantizers for gradient compres…
▽ More
The goal of this thesis is to study the compression problems arising in distributed computing systematically. In the first part of the thesis, we study gradient compression for distributed first-order optimization. We begin by establishing information theoretic lower bounds on optimization accuracy when only finite precision gradients are used. Also, we develop fast quantizers for gradient compression, which, when used with standard first-order optimization algorithms, match the aforementioned lower bounds. In the second part of the thesis, we study distributed mean estimation, an important primitive for distributed optimization algorithms. We develop efficient estimators which improve over state of the art by efficiently using the side information present at the center. We also revisit the Gaussian rate-distortion problem and develop efficient quantizers for this problem in both the side-information and the no side information setting. Finally, we study the problem of entropic compression of the symbols transmitted by the edge devices to the center, which commonly arise in cyber-physical systems. Our goal is to design entropic compression schemes that allow the information to be transmitted in a 'timely' manner, which, in turn, enables the center to have access to the latest information for computation. We shed light on the structure of the optimal entropic compression scheme and, using this structure, we develop efficient algorithms to compute this optimal compression scheme.
△ Less
Submitted 11 January, 2023;
originally announced January 2023.
-
Fundamental limits of over-the-air optimization: Are analog schemes optimal?
Authors:
Shubham K Jha,
Prathamesh Mayekar,
Himanshu Tyagi
Abstract:
We consider over-the-air convex optimization on a $d-$dimensional space where coded gradients are sent over an additive Gaussian noise channel with variance $σ^2$. The codewords satisfy an average power constraint $P$, resulting in the signal-to-noise ratio (SNR) of $P/σ^2$. We derive bounds for the convergence rates for over-the-air optimization. Our first result is a lower bound for the converge…
▽ More
We consider over-the-air convex optimization on a $d-$dimensional space where coded gradients are sent over an additive Gaussian noise channel with variance $σ^2$. The codewords satisfy an average power constraint $P$, resulting in the signal-to-noise ratio (SNR) of $P/σ^2$. We derive bounds for the convergence rates for over-the-air optimization. Our first result is a lower bound for the convergence rate showing that any code must slowdown the convergence rate by a factor of roughly $\sqrt{d/\log(1+\mathtt{SNR})}$. Next, we consider a popular class of schemes called $analog$ $coding$, where a linear function of the gradient is sent. We show that a simple scaled transmission analog coding scheme results in a slowdown in convergence rate by a factor of $\sqrt{d(1+1/\mathtt{SNR})}$. This matches the previous lower bound up to constant factors for low SNR, making the scaled transmission scheme optimal at low SNR. However, we show that this slowdown is necessary for any analog coding scheme. In particular, a slowdown in convergence by a factor of $\sqrt{d}$ for analog coding remains even when SNR tends to infinity. Remarkably, we present a simple quantize-and-modulate scheme that uses $Amplitude$ $Shift$ $Keying$ and almost attains the optimal convergence rate at all SNRs.
△ Less
Submitted 15 September, 2021; v1 submitted 11 September, 2021;
originally announced September 2021.
-
Information-constrained optimization: can adaptive processing of gradients help?
Authors:
Jayadev Acharya,
Clément L. Canonne,
Prathamesh Mayekar,
Himanshu Tyagi
Abstract:
We revisit first-order optimization under local information constraints such as local privacy, gradient quantization, and computational constraints limiting access to a few coordinates of the gradient. In this setting, the optimization algorithm is not allowed to directly access the complete output of the gradient oracle, but only gets limited information about it subject to the local information…
▽ More
We revisit first-order optimization under local information constraints such as local privacy, gradient quantization, and computational constraints limiting access to a few coordinates of the gradient. In this setting, the optimization algorithm is not allowed to directly access the complete output of the gradient oracle, but only gets limited information about it subject to the local information constraints.
We study the role of adaptivity in processing the gradient output to obtain this limited information from it.We consider optimization for both convex and strongly convex functions and obtain tight or nearly tight lower bounds for the convergence rate, when adaptive gradient processing is allowed. Prior work was restricted to convex functions and allowed only nonadaptive processing of gradients. For both of these function classes and for the three information constraints mentioned above, our lower bound implies that adaptive processing of gradients cannot outperform nonadaptive processing in most regimes of interest. We complement these results by exhibiting a natural optimization problem under information constraints for which adaptive processing of gradient strictly outperforms nonadaptive processing.
△ Less
Submitted 2 April, 2021;
originally announced April 2021.
-
Wyner-Ziv Estimators for Distributed Mean Estimation with Side Information and Optimization
Authors:
Prathamesh Mayekar,
Shubham Jha,
Ananda Theertha Suresh,
Himanshu Tyagi
Abstract:
Communication efficient distributed mean estimation is an important primitive that arises in many distributed learning and optimization scenarios such as federated learning. Without any probabilistic assumptions on the underlying data, we study the problem of distributed mean estimation where the server has access to side information. We propose \emph{Wyner-Ziv estimators}, which are communication…
▽ More
Communication efficient distributed mean estimation is an important primitive that arises in many distributed learning and optimization scenarios such as federated learning. Without any probabilistic assumptions on the underlying data, we study the problem of distributed mean estimation where the server has access to side information. We propose \emph{Wyner-Ziv estimators}, which are communication and computationally efficient and near-optimal when an upper bound for the distance between the side information and the data is known. As a corollary, we also show that our algorithms provide efficient schemes for the classic Wyner-Ziv problem in information theory. In a different direction, when there is no knowledge assumed about the distance between side information and the data, we present an alternative Wyner-Ziv estimator that uses correlated sampling. This latter setting offers {\em universal recovery guarantees}, and perhaps will be of interest in practice when the number of users is large and kee** track of the distances between the data and the side information may not be possible.
With this mean estimator at our disposal, we revisit basic problems in decentralized optimization and compression where our Wyner-Ziv estimator yields algorithms with almost optimal performance. First, we consider the problem of communication constrained distributed optimization and provide an algorithm which attains the optimal convergence rate by exploiting the fact that the gradient estimates are close to each other. Specifically, the gradient compression scheme in our algorithm first uses half of the parties to form side information and then uses our Wyner-Ziv estimator to compress the remaining half of the gradient estimates.
△ Less
Submitted 14 November, 2022; v1 submitted 24 November, 2020;
originally announced November 2020.
-
Limits on Gradient Compression for Stochastic Optimization
Authors:
Prathamesh Mayekar,
Himanshu Tyagi
Abstract:
We consider stochastic optimization over $\ell_p$ spaces using access to a first-order oracle. We ask: {What is the minimum precision required for oracle outputs to retain the unrestricted convergence rates?} We characterize this precision for every $p\geq 1$ by deriving information theoretic lower bounds and by providing quantizers that (almost) achieve these lower bounds. Our quantizers are new…
▽ More
We consider stochastic optimization over $\ell_p$ spaces using access to a first-order oracle. We ask: {What is the minimum precision required for oracle outputs to retain the unrestricted convergence rates?} We characterize this precision for every $p\geq 1$ by deriving information theoretic lower bounds and by providing quantizers that (almost) achieve these lower bounds. Our quantizers are new and easy to implement. In particular, our results are exact for $p=2$ and $p=\infty$, showing the minimum precision needed in these settings are $Θ(d)$ and $Θ(\log d)$, respectively. The latter result is surprising since recovering the gradient vector will require $Ω(d)$ bits.
△ Less
Submitted 24 January, 2020;
originally announced January 2020.
-
RATQ: A Universal Fixed-Length Quantizer for Stochastic Optimization
Authors:
Prathamesh Mayekar,
Himanshu Tyagi
Abstract:
We present Rotated Adaptive Tetra-iterated Quantizer (RATQ), a fixed-length quantizer for gradients in first order stochastic optimization. RATQ is easy to implement and involves only a Hadamard transform computation and adaptive uniform quantization with appropriately chosen dynamic ranges. For noisy gradients with almost surely bounded Euclidean norms, we establish an information theoretic lower…
▽ More
We present Rotated Adaptive Tetra-iterated Quantizer (RATQ), a fixed-length quantizer for gradients in first order stochastic optimization. RATQ is easy to implement and involves only a Hadamard transform computation and adaptive uniform quantization with appropriately chosen dynamic ranges. For noisy gradients with almost surely bounded Euclidean norms, we establish an information theoretic lower bound for optimization accuracy using finite precision gradients and show that RATQ almost attains this lower bound.
For mean square bounded noisy gradients, we use a gain-shape quantizer which separately quantizes the Euclidean norm and uses RATQ to quantize the normalized unit norm vector. We establish lower bounds for performance of any optimization procedure and shape quantizer, when used with a uniform gain quantizer. Finally, we propose an adaptive quantizer for gain which when used with RATQ for shape quantizer outperforms uniform gain quantization and is, in fact, close to optimal.
As a by-product, we show that our fixed-length quantizer RATQ has almost the same performance as the optimal variable-length quantizers for distributed mean estimation. Also, we obtain an efficient quantizer for Gaussian vectors which attains a rate very close to the Gaussian rate-distortion function and is, in fact, universal for subgaussian input vectors.
△ Less
Submitted 16 December, 2019; v1 submitted 22 August, 2019;
originally announced August 2019.
-
Optimal Source Codes for Timely Updates
Authors:
Prathamesh Mayekar,
Parimal Parag,
Himanshu Tyagi
Abstract:
A transmitter observing a sequence of independent and identically distributed random variables seeks to keep a receiver updated about its latest observations. The receiver need not be apprised about each symbol seen by the transmitter, but needs to output a symbol at each time instant $t$. If at time $t$ the receiver outputs the symbol seen by the transmitter at time $U(t)\leq t$, the age of infor…
▽ More
A transmitter observing a sequence of independent and identically distributed random variables seeks to keep a receiver updated about its latest observations. The receiver need not be apprised about each symbol seen by the transmitter, but needs to output a symbol at each time instant $t$. If at time $t$ the receiver outputs the symbol seen by the transmitter at time $U(t)\leq t$, the age of information at the receiver at time $t$ is $t-U(t)$. We study the design of lossless source codes that enable transmission with minimum average age at the receiver. We show that the asymptotic minimum average age can be attained up to a constant gap by the Shannon codes for a tilted version of the original pmf generating the symbols, which can be computed easily by solving an optimization problem. Furthermore, we exhibit an example with alphabet $\X$ where Shannon codes for the original pmf incur an asymptotic average age of a factor $O(\sqrt{\log |\X|})$ more than that achieved by our codes. Underlying our prescription for optimal codes is a new variational formula for integer moments of random variables, which may be of independent interest. Also, we discuss possible extensions of our formulation to randomized schemes and to the erasure channel, and include a treatment of the related problem of source coding for minimum average queuing delay.
△ Less
Submitted 27 March, 2020; v1 submitted 12 October, 2018;
originally announced October 2018.