Search | arXiv e-print repository

No-Regret Algorithms for Safe Bayesian Optimization with Monotonicity Constraints

Authors: Arpan Losalka, Jonathan Scarlett

Abstract: We consider the problem of sequentially maximizing an unknown function $f$ over a set of actions of the form $(s,\mathbf{x})$, where the selected actions must satisfy a safety constraint with respect to an unknown safety function $g$. We model $f$ and $g$ as lying in a reproducing kernel Hilbert space (RKHS), which facilitates the use of Gaussian process methods. While existing works for this sett… ▽ More We consider the problem of sequentially maximizing an unknown function $f$ over a set of actions of the form $(s,\mathbf{x})$, where the selected actions must satisfy a safety constraint with respect to an unknown safety function $g$. We model $f$ and $g$ as lying in a reproducing kernel Hilbert space (RKHS), which facilitates the use of Gaussian process methods. While existing works for this setting have provided algorithms that are guaranteed to identify a near-optimal safe action, the problem of attaining low cumulative regret has remained largely unexplored, with a key challenge being that expanding the safe region can incur high regret. To address this challenge, we show that if $g$ is monotone with respect to just the single variable $s$ (with no such constraint on $f$), sublinear regret becomes achievable with our proposed algorithm. In addition, we show that a modified version of our algorithm is able to attain sublinear regret (for suitably defined notions of regret) for the task of finding a near-optimal $s$ corresponding to every $\mathbf{x}$, as opposed to only finding the global safe optimum. Our findings are supported with empirical evaluations on various objective and safety functions. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Journal ref: Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:3232-3240, 2024

arXiv:2401.05716 [pdf, other]

Kernelized Normalizing Constant Estimation: Bridging Bayesian Quadrature and Bayesian Optimization

Authors: Xu Cai, Jonathan Scarlett

Abstract: In this paper, we study the problem of estimating the normalizing constant $\int e^{-λf(x)}dx$ through queries to the black-box function $f$, where $f$ belongs to a reproducing kernel Hilbert space (RKHS), and $λ$ is a problem parameter. We show that to estimate the normalizing constant within a small relative error, the level of difficulty depends on the value of $λ$: When $λ$ approaches zero, th… ▽ More In this paper, we study the problem of estimating the normalizing constant $\int e^{-λf(x)}dx$ through queries to the black-box function $f$, where $f$ belongs to a reproducing kernel Hilbert space (RKHS), and $λ$ is a problem parameter. We show that to estimate the normalizing constant within a small relative error, the level of difficulty depends on the value of $λ$: When $λ$ approaches zero, the problem is similar to Bayesian quadrature (BQ), while when $λ$ approaches infinity, the problem is similar to Bayesian optimization (BO). More generally, the problem varies between BQ and BO. We find that this pattern holds true even when the function evaluations are noisy, bringing new aspects to this topic. Our findings are supported by both algorithm-independent lower bounds and algorithmic upper bounds, as well as simulation studies conducted on a variety of benchmark functions. △ Less

Submitted 11 January, 2024; originally announced January 2024.

Comments: AAAI-24

arXiv:2310.03758 [pdf, other]

A Unified Framework for Uniform Signal Recovery in Nonlinear Generative Compressed Sensing

Authors: Junren Chen, Jonathan Scarlett, Michael K. Ng, Zhaoqiang Liu

Abstract: In generative compressed sensing (GCS), we want to recover a signal $\mathbf{x}^* \in \mathbb{R}^n$ from $m$ measurements ($m\ll n$) using a generative prior $\mathbf{x}^*\in G(\mathbb{B}_2^k(r))$, where $G$ is typically an $L$-Lipschitz continuous generative model and $\mathbb{B}_2^k(r)$ represents the radius-$r$ $\ell_2$-ball in $\mathbb{R}^k$. Under nonlinear measurements, most prior results ar… ▽ More In generative compressed sensing (GCS), we want to recover a signal $\mathbf{x}^* \in \mathbb{R}^n$ from $m$ measurements ($m\ll n$) using a generative prior $\mathbf{x}^*\in G(\mathbb{B}_2^k(r))$, where $G$ is typically an $L$-Lipschitz continuous generative model and $\mathbb{B}_2^k(r)$ represents the radius-$r$ $\ell_2$-ball in $\mathbb{R}^k$. Under nonlinear measurements, most prior results are non-uniform, i.e., they hold with high probability for a fixed $\mathbf{x}^*$ rather than for all $\mathbf{x}^*$ simultaneously. In this paper, we build a unified framework to derive uniform recovery guarantees for nonlinear GCS where the observation model is nonlinear and possibly discontinuous or unknown. Our framework accommodates GCS with 1-bit/uniformly quantized observations and single index models as canonical examples. Specifically, using a single realization of the sensing ensemble and generalized Lasso, {\em all} $\mathbf{x}^*\in G(\mathbb{B}_2^k(r))$ can be recovered up to an $\ell_2$-error at most $ε$ using roughly $\tilde{O}({k}/{ε^2})$ samples, with omitted logarithmic factors typically being dominated by $\log L$. Notably, this almost coincides with existing non-uniform guarantees up to logarithmic factors, hence the uniformity costs very little. As part of our technical contributions, we introduce the Lipschitz approximation to handle discontinuous observation models. We also develop a concentration inequality that produces tighter bounds for product processes whose index sets have low metric entropy. Experimental results are presented to corroborate our theory. △ Less

Submitted 9 October, 2023; v1 submitted 25 September, 2023; originally announced October 2023.

Comments: Accepted to NeurIPS 2023

arXiv:2304.12680 [pdf, ps, other]

Communication-Constrained Bandits under Additive Gaussian Noise

Authors: Prathamesh Mayekar, Jonathan Scarlett, Vincent Y. F. Tan

Abstract: We study a distributed stochastic multi-armed bandit where a client supplies the learner with communication-constrained feedback based on the rewards for the corresponding arm pulls. In our setup, the client must encode the rewards such that the second moment of the encoded rewards is no more than $P$, and this encoded reward is further corrupted by additive Gaussian noise of variance $σ^2$; the l… ▽ More We study a distributed stochastic multi-armed bandit where a client supplies the learner with communication-constrained feedback based on the rewards for the corresponding arm pulls. In our setup, the client must encode the rewards such that the second moment of the encoded rewards is no more than $P$, and this encoded reward is further corrupted by additive Gaussian noise of variance $σ^2$; the learner only has access to this corrupted reward. For this setting, we derive an information-theoretic lower bound of $Ω\left(\sqrt{\frac{KT}{\mathtt{SNR} \wedge1}} \right)$ on the minimax regret of any scheme, where $ \mathtt{SNR} := \frac{P}{σ^2}$, and $K$ and $T$ are the number of arms and time horizon, respectively. Furthermore, we propose a multi-phase bandit algorithm, $\mathtt{UE\text{-}UCB++}$, which matches this lower bound to a minor additive factor. $\mathtt{UE\text{-}UCB++}$ performs uniform exploration in its initial phases and then utilizes the {\em upper confidence bound }(UCB) bandit algorithm in its final phase. An interesting feature of $\mathtt{UE\text{-}UCB++}$ is that the coarser estimates of the mean rewards formed during a uniform exploration phase help to refine the encoding protocol in the next phase, leading to more accurate mean estimates of the rewards in the subsequent phase. This positive reinforcement cycle is critical to reducing the number of uniform exploration rounds and closely matching our lower bound. △ Less

Submitted 6 June, 2023; v1 submitted 25 April, 2023; originally announced April 2023.

arXiv:2211.05430 [pdf, other]

Regret Bounds for Noise-Free Cascaded Kernelized Bandits

Authors: Zihan Li, Jonathan Scarlett

Abstract: We consider optimizing a function network in the noise-free grey-box setting with RKHS function classes, where the exact intermediate results are observable. We assume that the structure of the network is known (but not the underlying functions comprising it), and we study three types of structures: (1) chain: a cascade of scalar-valued functions, (2) multi-output chain: a cascade of vector-valued… ▽ More We consider optimizing a function network in the noise-free grey-box setting with RKHS function classes, where the exact intermediate results are observable. We assume that the structure of the network is known (but not the underlying functions comprising it), and we study three types of structures: (1) chain: a cascade of scalar-valued functions, (2) multi-output chain: a cascade of vector-valued functions, and (3) feed-forward network: a fully connected feed-forward network of scalar-valued functions. We propose a sequential upper confidence bound based algorithm GPN-UCB along with a general theoretical upper bound on the cumulative regret. In addition, we propose a non-adaptive sampling based method along with its theoretical upper bound on the simple regret for the Matérn kernel. We also provide algorithm-independent lower bounds on the simple regret and cumulative regret. Our regret bounds for GPN-UCB have the same dependence on the time horizon as the best known in the vanilla black-box setting, as well as near-optimal dependencies on other parameters (e.g., RKHS norm and network length). △ Less

Submitted 13 May, 2024; v1 submitted 10 November, 2022; originally announced November 2022.

Comments: TMLR

arXiv:2211.01561 [pdf, other]

Benefits of Monotonicity in Safe Exploration with Gaussian Processes

Authors: Arpan Losalka, Jonathan Scarlett

Abstract: We consider the problem of sequentially maximising an unknown function over a set of actions while ensuring that every sampled point has a function value below a given safety threshold. We model the function using kernel-based and Gaussian process methods, while differing from previous works in our assumption that the function is monotonically increasing with respect to a \emph{safety variable}. T… ▽ More We consider the problem of sequentially maximising an unknown function over a set of actions while ensuring that every sampled point has a function value below a given safety threshold. We model the function using kernel-based and Gaussian process methods, while differing from previous works in our assumption that the function is monotonically increasing with respect to a \emph{safety variable}. This assumption is motivated by various practical applications such as adaptive clinical trial design and robotics. Taking inspiration from the \textsc{\sffamily GP-UCB} and \textsc{\sffamily SafeOpt} algorithms, we propose an algorithm, monotone safe {\sffamily UCB} (\textsc{\sffamily M-SafeUCB}) for this task. We show that \textsc{\sffamily M-SafeUCB} enjoys theoretical guarantees in terms of safety, a suitably-defined regret notion, and approximately finding the entire safe boundary. In addition, we illustrate that the monotonicity assumption yields significant benefits in terms of the guarantees obtained, as well as algorithmic simplicity and efficiency. We support our theoretical findings by performing empirical evaluations on a variety of functions, including a simulated clinical trial experiment. △ Less

Submitted 19 June, 2023; v1 submitted 2 November, 2022; originally announced November 2022.

arXiv:2210.01295 [pdf, other]

Max-Quantile Grouped Infinite-Arm Bandits

Authors: Ivan Lau, Yan Hao Ling, Mayank Shrivastava, Jonathan Scarlett

Abstract: In this paper, we consider a bandit problem in which there are a number of groups each consisting of infinitely many arms. Whenever a new arm is requested from a given group, its mean reward is drawn from an unknown reservoir distribution (different for each group), and the uncertainty in the arm's mean reward can only be reduced via subsequent pulls of the arm. The goal is to identify the infinit… ▽ More In this paper, we consider a bandit problem in which there are a number of groups each consisting of infinitely many arms. Whenever a new arm is requested from a given group, its mean reward is drawn from an unknown reservoir distribution (different for each group), and the uncertainty in the arm's mean reward can only be reduced via subsequent pulls of the arm. The goal is to identify the infinite-arm group whose reservoir distribution has the highest $(1-α)$-quantile (e.g., median if $α= \frac{1}{2}$), using as few total arm pulls as possible. We introduce a two-step algorithm that first requests a fixed number of arms from each group and then runs a finite-arm grouped max-quantile bandit algorithm. We characterize both the instance-dependent and worst-case regret, and provide a matching lower bound for the latter, while discussing various strengths, weaknesses, algorithmic improvements, and potential lower bounds associated with our instance-dependent upper bounds. △ Less

Submitted 1 February, 2023; v1 submitted 3 October, 2022; originally announced October 2022.

Comments: ALT 2023

arXiv:2206.14373 [pdf, other]

Theoretical Perspectives on Deep Learning Methods in Inverse Problems

Authors: Jonathan Scarlett, Reinhard Heckel, Miguel R. D. Rodrigues, Paul Hand, Yonina C. Eldar

Abstract: In recent years, there have been significant advances in the use of deep learning methods in inverse problems such as denoising, compressive sensing, inpainting, and super-resolution. While this line of works has predominantly been driven by practical algorithms and experiments, it has also given rise to a variety of intriguing theoretical problems. In this paper, we survey some of the prominent t… ▽ More In recent years, there have been significant advances in the use of deep learning methods in inverse problems such as denoising, compressive sensing, inpainting, and super-resolution. While this line of works has predominantly been driven by practical algorithms and experiments, it has also given rise to a variety of intriguing theoretical problems. In this paper, we survey some of the prominent theoretical developments in this line of works, focusing in particular on generative priors, untrained neural network priors, and unfolding algorithms. In addition to summarizing existing results in these topics, we highlight several ongoing challenges and open problems. △ Less

Submitted 29 January, 2023; v1 submitted 28 June, 2022; originally announced June 2022.

Comments: IEEE JSAIT (Special Issue on Deep Learning for Inverse Problems)

arXiv:2203.09693 [pdf, other]

Generative Principal Component Analysis

Authors: Zhaoqiang Liu, Jiulong Liu, Subhroshekhar Ghosh, Jun Han, Jonathan Scarlett

Abstract: In this paper, we study the problem of principal component analysis with generative modeling assumptions, adopting a general model for the observed matrix that encompasses notable special cases, including spiked matrix recovery and phase retrieval. The key assumption is that the underlying signal lies near the range of an $L$-Lipschitz continuous generative model with bounded $k$-dimensional input… ▽ More In this paper, we study the problem of principal component analysis with generative modeling assumptions, adopting a general model for the observed matrix that encompasses notable special cases, including spiked matrix recovery and phase retrieval. The key assumption is that the underlying signal lies near the range of an $L$-Lipschitz continuous generative model with bounded $k$-dimensional inputs. We propose a quadratic estimator, and show that it enjoys a statistical rate of order $\sqrt{\frac{k\log L}{m}}$, where $m$ is the number of samples. We also provide a near-matching algorithm-independent lower bound. Moreover, we provide a variant of the classic power method, which projects the calculated data onto the range of the generative model during each iteration. We show that under suitable conditions, this method converges exponentially fast to a point achieving the above-mentioned statistical rate. We perform experiments on various image datasets for spiked matrix and phase retrieval models, and illustrate performance gains of our method to the classic power method and the truncated power method devised for sparse principal component analysis. △ Less

Submitted 7 September, 2022; v1 submitted 17 March, 2022; originally announced March 2022.

Comments: ICLR 2022 paper + additional appendix on algorithm-independent lower bounds + corrected experimental results for the Fashion-MNIST dataset

arXiv:2202.10615 [pdf, other]

On Average-Case Error Bounds for Kernel-Based Bayesian Quadrature

Authors: Xu Cai, Chi Thanh Lam, Jonathan Scarlett

Abstract: In this paper, we study error bounds for {\em Bayesian quadrature} (BQ), with an emphasis on noisy settings, randomized algorithms, and average-case performance measures. We seek to approximate the integral of functions in a {\em Reproducing Kernel Hilbert Space} (RKHS), particularly focusing on the Matérn-$ν$ and squared exponential (SE) kernels, with samples from the function potentially being c… ▽ More In this paper, we study error bounds for {\em Bayesian quadrature} (BQ), with an emphasis on noisy settings, randomized algorithms, and average-case performance measures. We seek to approximate the integral of functions in a {\em Reproducing Kernel Hilbert Space} (RKHS), particularly focusing on the Matérn-$ν$ and squared exponential (SE) kernels, with samples from the function potentially being corrupted by Gaussian noise. We provide a two-step meta-algorithm that serves as a general tool for relating the average-case quadrature error with the $L^2$-function approximation error. When specialized to the Matérn kernel, we recover an existing near-optimal error rate while avoiding the existing method of repeatedly sampling points. When specialized to other settings, we obtain new average-case results for settings including the SE kernel with noise and the Matérn kernel with misspecification. Finally, we present algorithm-independent lower bounds that have greater generality and/or give distinct proofs compared to existing ones. △ Less

Submitted 10 February, 2023; v1 submitted 21 February, 2022; originally announced February 2022.

arXiv:2202.04005 [pdf, ps, other]

Improved Convergence Rates for Sparse Approximation Methods in Kernel-Based Learning

Authors: Sattar Vakili, Jonathan Scarlett, Da-shan Shiu, Alberto Bernacchia

Abstract: Kernel-based models such as kernel ridge regression and Gaussian processes are ubiquitous in machine learning applications for regression and optimization. It is well known that a major downside for kernel-based models is the high computational cost; given a dataset of $n$ samples, the cost grows as $\mathcal{O}(n^3)$. Existing sparse approximation methods can yield a significant reduction in the… ▽ More Kernel-based models such as kernel ridge regression and Gaussian processes are ubiquitous in machine learning applications for regression and optimization. It is well known that a major downside for kernel-based models is the high computational cost; given a dataset of $n$ samples, the cost grows as $\mathcal{O}(n^3)$. Existing sparse approximation methods can yield a significant reduction in the computational cost, effectively reducing the actual cost down to as low as $\mathcal{O}(n)$ in certain cases. Despite this remarkable empirical success, significant gaps remain in the existing results for the analytical bounds on the error due to approximation. In this work, we provide novel confidence intervals for the Nyström method and the sparse variational Gaussian process approximation method, which we establish using novel interpretations of the approximate (surrogate) posterior variance of the models. Our confidence intervals lead to improved performance bounds in both regression and optimization problems. △ Less

Submitted 18 June, 2022; v1 submitted 8 February, 2022; originally announced February 2022.

Comments: International Conference on Machine Learning (ICML) 2022

arXiv:2202.01850 [pdf, other]

A Robust Phased Elimination Algorithm for Corruption-Tolerant Gaussian Process Bandits

Authors: Ilija Bogunovic, Zihan Li, Andreas Krause, Jonathan Scarlett

Abstract: We consider the sequential optimization of an unknown, continuous, and expensive to evaluate reward function, from noisy and adversarially corrupted observed rewards. When the corruption attacks are subject to a suitable budget $C$ and the function lives in a Reproducing Kernel Hilbert Space (RKHS), the problem can be posed as corrupted Gaussian process (GP) bandit optimization. We propose a novel… ▽ More We consider the sequential optimization of an unknown, continuous, and expensive to evaluate reward function, from noisy and adversarially corrupted observed rewards. When the corruption attacks are subject to a suitable budget $C$ and the function lives in a Reproducing Kernel Hilbert Space (RKHS), the problem can be posed as corrupted Gaussian process (GP) bandit optimization. We propose a novel robust elimination-type algorithm that runs in epochs, combines exploration with infrequent switching to select a small subset of actions, and plays each action for multiple time instants. Our algorithm, Robust GP Phased Elimination (RGP-PE), successfully balances robustness to corruptions with exploration and exploitation such that its performance degrades minimally in the presence (or absence) of adversarial corruptions. When $T$ is the number of samples and $γ_T$ is the maximal information gain, the corruption-dependent term in our regret bound is $O(C γ_T^{3/2})$, which is significantly tighter than the existing $O(C \sqrt{T γ_T})$ for several commonly-considered kernels. We perform the first empirical study of robustness in the corrupted GP bandit setting, and show that our algorithm is robust against a variety of adversarial attacks. △ Less

Submitted 28 March, 2022; v1 submitted 3 February, 2022; originally announced February 2022.

Comments: Added references

arXiv:2111.08862 [pdf, other]

Max-Min Grouped Bandits

Authors: Zhenlin Wang, Jonathan Scarlett

Abstract: In this paper, we introduce a multi-armed bandit problem termed max-min grouped bandits, in which the arms are arranged in possibly-overlap** groups, and the goal is to find the group whose worst arm has the highest mean reward. This problem is of interest in applications such as recommendation systems and resource allocation, and is also closely related to widely-studied robust optimization pro… ▽ More In this paper, we introduce a multi-armed bandit problem termed max-min grouped bandits, in which the arms are arranged in possibly-overlap** groups, and the goal is to find the group whose worst arm has the highest mean reward. This problem is of interest in applications such as recommendation systems and resource allocation, and is also closely related to widely-studied robust optimization problems. We present two algorithms based successive elimination and robust optimization, and derive upper bounds on the number of samples to guarantee finding a max-min optimal or near-optimal group, as well as an algorithm-independent lower bound. We discuss the degree of tightness of our bounds in various cases of interest, and the difficulties in deriving uniformly tight bounds. △ Less

Submitted 14 March, 2022; v1 submitted 16 November, 2021; originally announced November 2021.

Comments: AAAI 2022 paper + technical appendix (supplementary material), single-column format

arXiv:2110.15458 [pdf, ps, other]

Open Problem: Tight Online Confidence Intervals for RKHS Elements

Authors: Sattar Vakili, Jonathan Scarlett, Tara Javidi

Abstract: Confidence intervals are a crucial building block in the analysis of various online learning problems. The analysis of kernel based bandit and reinforcement learning problems utilize confidence intervals applicable to the elements of a reproducing kernel Hilbert space (RKHS). However, the existing confidence bounds do not appear to be tight, resulting in suboptimal regret bounds. In fact, the exis… ▽ More Confidence intervals are a crucial building block in the analysis of various online learning problems. The analysis of kernel based bandit and reinforcement learning problems utilize confidence intervals applicable to the elements of a reproducing kernel Hilbert space (RKHS). However, the existing confidence bounds do not appear to be tight, resulting in suboptimal regret bounds. In fact, the existing regret bounds for several kernelized bandit algorithms (e.g., GP-UCB, GP-TS, and their variants) may fail to even be sublinear. It is unclear whether the suboptimal regret bound is a fundamental shortcoming of these algorithms or an artifact of the proof, and the main challenge seems to stem from the online (sequential) nature of the observation points. We formalize the question of online confidence intervals in the RKHS setting and overview the existing results. △ Less

Submitted 28 October, 2021; originally announced October 2021.

arXiv:2110.08449 [pdf, other]

Adversarial Attacks on Gaussian Process Bandits

Authors: Eric Han, Jonathan Scarlett

Abstract: Gaussian processes (GP) are a widely-adopted tool used to sequentially optimize black-box functions, where evaluations are costly and potentially noisy. Recent works on GP bandits have proposed to move beyond random noise and devise algorithms robust to adversarial attacks. This paper studies this problem from the attacker's perspective, proposing various adversarial attack methods with differing… ▽ More Gaussian processes (GP) are a widely-adopted tool used to sequentially optimize black-box functions, where evaluations are costly and potentially noisy. Recent works on GP bandits have proposed to move beyond random noise and devise algorithms robust to adversarial attacks. This paper studies this problem from the attacker's perspective, proposing various adversarial attack methods with differing assumptions on the attacker's strength and prior information. Our goal is to understand adversarial attacks on GP bandits from theoretical and practical perspectives. We focus primarily on targeted attacks on the popular GP-UCB algorithm and a related elimination-based algorithm, based on adversarially perturbing the function $f$ to produce another function $\tilde{f}$ whose optima are in some target region $\mathcal{R}_{\rm target}$. Based on our theoretical analysis, we devise both white-box attacks (known $f$) and black-box attacks (unknown $f$), with the former including a Subtraction attack and Clip** attack, and the latter including an Aggressive subtraction attack. We demonstrate that adversarial attacks on GP bandits can succeed in forcing the algorithm towards $\mathcal{R}_{\rm target}$ even with a low attack budget, and we test our attacks' effectiveness on a diverse range of objective functions. △ Less

Submitted 16 June, 2022; v1 submitted 15 October, 2021; originally announced October 2021.

Comments: Accepted to ICML 2022

MSC Class: 90C26 ACM Class: G.1.6

arXiv:2110.07788 [pdf, other]

Gaussian Process Bandit Optimization with Few Batches

Authors: Zihan Li, Jonathan Scarlett

Abstract: In this paper, we consider the problem of black-box optimization using Gaussian Process (GP) bandit optimization with a small number of batches. Assuming the unknown function has a low norm in the Reproducing Kernel Hilbert Space (RKHS), we introduce a batch algorithm inspired by batched finite-arm bandit algorithms, and show that it achieves the cumulative regret upper bound… ▽ More In this paper, we consider the problem of black-box optimization using Gaussian Process (GP) bandit optimization with a small number of batches. Assuming the unknown function has a low norm in the Reproducing Kernel Hilbert Space (RKHS), we introduce a batch algorithm inspired by batched finite-arm bandit algorithms, and show that it achieves the cumulative regret upper bound $O^\ast(\sqrt{Tγ_T})$ using $O(\log\log T)$ batches within time horizon $T$, where the $O^\ast(\cdot)$ notation hides dimension-independent logarithmic factors and $γ_T$ is the maximum information gain associated with the kernel. This bound is near-optimal for several kernels of interest and improves on the typical $O^\ast(\sqrt{T}γ_T)$ bound, and our approach is arguably the simplest among algorithms attaining this improvement. In addition, in the case of a constant number of batches (not depending on $T$), we propose a modified version of our algorithm, and characterize how the regret is impacted by the number of batches, focusing on the squared exponential and Matérn kernels. The algorithmic upper bounds are shown to be nearly minimax optimal via analogous algorithm-independent lower bounds. △ Less

Submitted 21 February, 2022; v1 submitted 14 October, 2021; originally announced October 2021.

Comments: AISTATS 2022

arXiv:2108.03570 [pdf, other]

Robust 1-bit Compressive Sensing with Partial Gaussian Circulant Matrices and Generative Priors

Authors: Zhaoqiang Liu, Subhroshekhar Ghosh, Jun Han, Jonathan Scarlett

Abstract: In 1-bit compressive sensing, each measurement is quantized to a single bit, namely the sign of a linear function of an unknown vector, and the goal is to accurately recover the vector. While it is most popular to assume a standard Gaussian sensing matrix for 1-bit compressive sensing, using structured sensing matrices such as partial Gaussian circulant matrices is of significant practical importa… ▽ More In 1-bit compressive sensing, each measurement is quantized to a single bit, namely the sign of a linear function of an unknown vector, and the goal is to accurately recover the vector. While it is most popular to assume a standard Gaussian sensing matrix for 1-bit compressive sensing, using structured sensing matrices such as partial Gaussian circulant matrices is of significant practical importance due to their faster matrix operations. In this paper, we provide recovery guarantees for a correlation-based optimization algorithm for robust 1-bit compressive sensing with randomly signed partial Gaussian circulant matrices and generative models. Under suitable assumptions, we match guarantees that were previously only known to hold for i.i.d.~Gaussian matrices that require significantly more computation. We make use of a practical iterative algorithm, and perform numerical experiments on image datasets to corroborate our theoretical results. △ Less

Submitted 8 August, 2021; originally announced August 2021.

arXiv:2106.15358 [pdf, other]

Towards Sample-Optimal Compressive Phase Retrieval with Sparse and Generative Priors

Authors: Zhaoqiang Liu, Subhroshekhar Ghosh, Jonathan Scarlett

Abstract: Compressive phase retrieval is a popular variant of the standard compressive sensing problem in which the measurements only contain magnitude information. In this paper, motivated by recent advances in deep generative models, we provide recovery guarantees with near-optimal sample complexity for phase retrieval with generative priors. We first show that when using i.i.d. Gaussian measurements and… ▽ More Compressive phase retrieval is a popular variant of the standard compressive sensing problem in which the measurements only contain magnitude information. In this paper, motivated by recent advances in deep generative models, we provide recovery guarantees with near-optimal sample complexity for phase retrieval with generative priors. We first show that when using i.i.d. Gaussian measurements and an $L$-Lipschitz continuous generative model with bounded $k$-dimensional inputs, roughly $O(k \log L)$ samples suffice to guarantee that any signal minimizing an amplitude-based empirical loss function is close to the true signal. Attaining this sample complexity with a practical algorithm remains a difficult challenge, and finding a good initialization for gradient-based methods has been observed to pose a major bottleneck. To partially address this, we further show that roughly $O(k \log L)$ samples ensure sufficient closeness between the underlying signal and any {\em globally optimal} solution to an optimization problem designed for spectral initialization (though finding such a solution may still be challenging). We also adapt this result to sparse phase retrieval, and show that $O(s \log n)$ samples are sufficient for a similar guarantee when the underlying signal is $s$-sparse and $n$-dimensional, matching an information-theoretic lower bound. While these guarantees do not directly correspond to a practical algorithm, we propose a practical spectral initialization method motivated by our findings, and experimentally observe performance gains over various existing spectral initialization methods for sparse phase retrieval. △ Less

Submitted 17 October, 2021; v1 submitted 29 June, 2021; originally announced June 2021.

Comments: Accepted to NeurIPS 2021

arXiv:2102.05793 [pdf, other]

Lenient Regret and Good-Action Identification in Gaussian Process Bandits

Authors: Xu Cai, Selwyn Gomes, Jonathan Scarlett

Abstract: In this paper, we study the problem of Gaussian process (GP) bandits under relaxed optimization criteria stating that any function value above a certain threshold is "good enough". On the theoretical side, we study various {\em lenient regret} notions in which all near-optimal actions incur zero penalty, and provide upper bounds on the lenient regret for GP-UCB and an elimination algorithm, circum… ▽ More In this paper, we study the problem of Gaussian process (GP) bandits under relaxed optimization criteria stating that any function value above a certain threshold is "good enough". On the theoretical side, we study various {\em lenient regret} notions in which all near-optimal actions incur zero penalty, and provide upper bounds on the lenient regret for GP-UCB and an elimination algorithm, circumventing the usual $O(\sqrt{T})$ term (with time horizon $T$) resulting from zooming extremely close towards the function maximum. In addition, we complement these upper bounds with algorithm-independent lower bounds. On the practical side, we consider the problem of finding a single "good action" according to a known pre-specified threshold, and introduce several good-action identification algorithms that exploit knowledge of the threshold. We experimentally find that such algorithms can often find a good action faster than standard optimization-based approaches. △ Less

Submitted 26 May, 2021; v1 submitted 10 February, 2021; originally announced February 2021.

Comments: ICML 2021

arXiv:2012.13088 [pdf, other]

High-Dimensional Bayesian Optimization via Tree-Structured Additive Models

Authors: Eric Han, Ishank Arora, Jonathan Scarlett

Abstract: Bayesian Optimization (BO) has shown significant success in tackling expensive low-dimensional black-box optimization problems. Many optimization problems of interest are high-dimensional, and scaling BO to such settings remains an important challenge. In this paper, we consider generalized additive models in which low-dimensional functions with overlap** subsets of variables are composed to mod… ▽ More Bayesian Optimization (BO) has shown significant success in tackling expensive low-dimensional black-box optimization problems. Many optimization problems of interest are high-dimensional, and scaling BO to such settings remains an important challenge. In this paper, we consider generalized additive models in which low-dimensional functions with overlap** subsets of variables are composed to model a high-dimensional target function. Our goal is to lower the computational resources required and facilitate faster model learning by reducing the model complexity while retaining the sample-efficiency of existing methods. Specifically, we constrain the underlying dependency graphs to tree structures in order to facilitate both the structure learning and optimization of the acquisition function. For the former, we propose a hybrid graph learning algorithm based on Gibbs sampling and mutation. In addition, we propose a novel zooming-based algorithm that permits generalized additive models to be employed more efficiently in the case of continuous domains. We demonstrate and discuss the efficacy of our approach via a range of experiments on synthetic functions and real-world datasets. △ Less

Submitted 23 December, 2020; originally announced December 2020.

Comments: To appear in AAAI 2021

MSC Class: 90C26 ACM Class: G.1.6

Journal ref: Vol. 35 No. 9: AAAI-21 Technical Tracks 9 (2021) 7630-7638

arXiv:2008.08757 [pdf, other]

On Lower Bounds for Standard and Robust Gaussian Process Bandit Optimization

Authors: Xu Cai, Jonathan Scarlett

Abstract: In this paper, we consider algorithm-independent lower bounds for the problem of black-box optimization of functions having a bounded norm is some Reproducing Kernel Hilbert Space (RKHS), which can be viewed as a non-Bayesian Gaussian process bandit problem. In the standard noisy setting, we provide a novel proof technique for deriving lower bounds on the regret, with benefits including simplicity… ▽ More In this paper, we consider algorithm-independent lower bounds for the problem of black-box optimization of functions having a bounded norm is some Reproducing Kernel Hilbert Space (RKHS), which can be viewed as a non-Bayesian Gaussian process bandit problem. In the standard noisy setting, we provide a novel proof technique for deriving lower bounds on the regret, with benefits including simplicity, versatility, and an improved dependence on the error probability. In a robust setting in which every sampled point may be perturbed by a suitably-constrained adversary, we provide a novel lower bound for deterministic strategies, demonstrating an inevitable joint dependence of the cumulative regret on the corruption level and the time horizon, in contrast with existing lower bounds that only characterize the individual dependencies. Furthermore, in a distinct robust setting in which the final point is perturbed by an adversary, we strengthen an existing lower bound that only holds for target success probabilities very close to one, by allowing for arbitrary success probabilities above $\frac{2}{3}$. △ Less

Submitted 24 May, 2021; v1 submitted 19 August, 2020; originally announced August 2020.

Comments: ICML 2021

arXiv:2007.03285 [pdf, other]

Stochastic Linear Bandits Robust to Adversarial Attacks

Authors: Ilija Bogunovic, Arpan Losalka, Andreas Krause, Jonathan Scarlett

Abstract: We consider a stochastic linear bandit problem in which the rewards are not only subject to random noise, but also adversarial attacks subject to a suitable budget $C$ (i.e., an upper bound on the sum of corruption magnitudes across the time horizon). We provide two variants of a Robust Phased Elimination algorithm, one that knows $C$ and one that does not. Both variants are shown to attain near-o… ▽ More We consider a stochastic linear bandit problem in which the rewards are not only subject to random noise, but also adversarial attacks subject to a suitable budget $C$ (i.e., an upper bound on the sum of corruption magnitudes across the time horizon). We provide two variants of a Robust Phased Elimination algorithm, one that knows $C$ and one that does not. Both variants are shown to attain near-optimal regret in the non-corrupted case $C = 0$, while incurring additional additive terms respectively having a linear and quadratic dependency on $C$ in general. We present algorithm independent lower bounds showing that these additive terms are near-optimal. In addition, in a contextual setting, we revisit a setup of diverse contexts, and show that a simple greedy algorithm is provably robust with a near-optimal additive regret term, despite performing no explicit exploration and not knowing $C$. △ Less

Submitted 27 October, 2020; v1 submitted 7 July, 2020; originally announced July 2020.

arXiv:2006.12415 [pdf, ps, other]

The Generalized Lasso with Nonlinear Observations and Generative Priors

Authors: Zhaoqiang Liu, Jonathan Scarlett

Abstract: In this paper, we study the problem of signal estimation from noisy non-linear measurements when the unknown $n$-dimensional signal is in the range of an $L$-Lipschitz continuous generative model with bounded $k$-dimensional inputs. We make the assumption of sub-Gaussian measurements, which is satisfied by a wide range of measurement models, such as linear, logistic, 1-bit, and other quantized mod… ▽ More In this paper, we study the problem of signal estimation from noisy non-linear measurements when the unknown $n$-dimensional signal is in the range of an $L$-Lipschitz continuous generative model with bounded $k$-dimensional inputs. We make the assumption of sub-Gaussian measurements, which is satisfied by a wide range of measurement models, such as linear, logistic, 1-bit, and other quantized models. In addition, we consider the impact of adversarial corruptions on these measurements. Our analysis is based on a generalized Lasso approach (Plan and Vershynin, 2016). We first provide a non-uniform recovery guarantee, which states that under i.i.d.~Gaussian measurements, roughly $O\left(\frac{k}{ε^2}\log L\right)$ samples suffice for recovery with an $\ell_2$-error of $ε$, and that this scheme is robust to adversarial noise. Then, we apply this result to neural network generative models, and discuss various extensions to other models and non-i.i.d.~measurements. Moreover, we show that our result can be extended to the uniform recovery guarantee under the assumption of a so-called local embedding property, which is satisfied by the 1-bit and censored Tobit models. △ Less

Submitted 8 October, 2020; v1 submitted 22 June, 2020; originally announced June 2020.

Comments: Accepted to NeurIPS 2020

arXiv:2003.01971 [pdf, other]

Corruption-Tolerant Gaussian Process Bandit Optimization

Authors: Ilija Bogunovic, Andreas Krause, Jonathan Scarlett

Abstract: We consider the problem of optimizing an unknown (typically non-convex) function with a bounded norm in some Reproducing Kernel Hilbert Space (RKHS), based on noisy bandit feedback. We consider a novel variant of this problem in which the point evaluations are not only corrupted by random noise, but also adversarial corruptions. We introduce an algorithm Fast-Slow GP-UCB based on Gaussian process… ▽ More We consider the problem of optimizing an unknown (typically non-convex) function with a bounded norm in some Reproducing Kernel Hilbert Space (RKHS), based on noisy bandit feedback. We consider a novel variant of this problem in which the point evaluations are not only corrupted by random noise, but also adversarial corruptions. We introduce an algorithm Fast-Slow GP-UCB based on Gaussian process methods, randomized selection between two instances labeled "fast" (but non-robust) and "slow" (but robust), enlarged confidence bounds, and the principle of optimism under uncertainty. We present a novel theoretical analysis upper bounding the cumulative regret in terms of the corruption level, the time horizon, and the underlying kernel, and we argue that certain dependencies cannot be improved. We observe that distinct algorithmic ideas are required depending on whether one is required to perform well in both the corrupted and non-corrupted settings, and whether the corruption level is known or not. △ Less

Submitted 4 March, 2020; originally announced March 2020.

Comments: Accepted to AISTATS 2020

arXiv:2002.08663 [pdf, ps, other]

Learning Gaussian Graphical Models via Multiplicative Weights

Authors: Anamay Chaturvedi, Jonathan Scarlett

Abstract: Graphical model selection in Markov random fields is a fundamental problem in statistics and machine learning. Two particularly prominent models, the Ising model and Gaussian model, have largely developed in parallel using different (though often related) techniques, and several practical algorithms with rigorous sample complexity bounds have been established for each. In this paper, we adapt a re… ▽ More Graphical model selection in Markov random fields is a fundamental problem in statistics and machine learning. Two particularly prominent models, the Ising model and Gaussian model, have largely developed in parallel using different (though often related) techniques, and several practical algorithms with rigorous sample complexity bounds have been established for each. In this paper, we adapt a recently proposed algorithm of Klivans and Meka (FOCS, 2017), based on the method of multiplicative weight updates, from the Ising model to the Gaussian model, via non-trivial modifications to both the algorithm and its analysis. The algorithm enjoys a sample complexity bound that is qualitatively similar to others in the literature, has a low runtime $O(mp^2)$ in the case of $m$ samples and $p$ nodes, and can trivially be implemented in an online manner. △ Less

Submitted 24 February, 2020; v1 submitted 20 February, 2020; originally announced February 2020.

Comments: AISTATS 2020

arXiv:2002.01697 [pdf, other]

Sample Complexity Bounds for 1-bit Compressive Sensing and Binary Stable Embeddings with Generative Priors

Authors: Zhaoqiang Liu, Selwyn Gomes, Avtansh Tiwari, Jonathan Scarlett

Abstract: The goal of standard 1-bit compressive sensing is to accurately recover an unknown sparse vector from binary-valued measurements, each indicating the sign of a linear function of the vector. Motivated by recent advances in compressive sensing with generative models, where a generative modeling assumption replaces the usual sparsity assumption, we study the problem of 1-bit compressive sensing with… ▽ More The goal of standard 1-bit compressive sensing is to accurately recover an unknown sparse vector from binary-valued measurements, each indicating the sign of a linear function of the vector. Motivated by recent advances in compressive sensing with generative models, where a generative modeling assumption replaces the usual sparsity assumption, we study the problem of 1-bit compressive sensing with generative models. We first consider noiseless 1-bit measurements, and provide sample complexity bounds for approximate recovery under i.i.d.~Gaussian measurements and a Lipschitz continuous generative prior, as well as a near-matching algorithm-independent lower bound. Moreover, we demonstrate that the Binary $ε$-Stable Embedding property, which characterizes the robustness of the reconstruction to measurement errors and noise, also holds for 1-bit compressive sensing with Lipschitz continuous generative models with sufficiently many Gaussian measurements. In addition, we apply our results to neural network generative models, and provide a proof-of-concept numerical experiment demonstrating significant improvements over sparsity-based approaches. △ Less

Submitted 22 June, 2020; v1 submitted 5 February, 2020; originally announced February 2020.

Comments: Accepted to ICML 2020

arXiv:2001.09327 [pdf, other]

Tight Regret Bounds for Noisy Optimization of a Brownian Motion

Authors: Zexin Wang, Vincent Y. F. Tan, Jonathan Scarlett

Abstract: We consider the problem of Bayesian optimization of a one-dimensional Brownian motion in which the $T$ adaptively chosen observations are corrupted by Gaussian noise. We show that as the smallest possible expected cumulative regret and the smallest possible expected simple regret scale as $Ω(σ\sqrt{T / \log (T)}) \cap \mathcal{O}(σ\sqrt{T} \cdot \log T)$ and… ▽ More We consider the problem of Bayesian optimization of a one-dimensional Brownian motion in which the $T$ adaptively chosen observations are corrupted by Gaussian noise. We show that as the smallest possible expected cumulative regret and the smallest possible expected simple regret scale as $Ω(σ\sqrt{T / \log (T)}) \cap \mathcal{O}(σ\sqrt{T} \cdot \log T)$ and $Ω(σ/ \sqrt{T \log (T)}) \cap \mathcal{O}(σ\log T / \sqrt{T})$ respectively, where $σ^2$ is the noise variance. Thus, our upper and lower bounds are tight up to a factor of $\mathcal{O}( (\log T)^{1.5} )$. The upper bound uses an algorithm based on confidence bounds and the Markov property of Brownian motion (among other useful properties), and the lower bound is based on a reduction to binary hypothesis testing. △ Less

Submitted 15 January, 2022; v1 submitted 25 January, 2020; originally announced January 2020.

arXiv:1911.02764 [pdf, other]

An Efficient Algorithm for Capacity-Approaching Noisy Adaptive Group Testing

Authors: Jonathan Scarlett

Abstract: In this paper, we consider the group testing problem with adaptive test designs and noisy outcomes. We propose a computationally efficient four-stage procedure with components including random binning, identification of bins containing defective items, 1-sparse recovery via channel codes, and a "clean-up" step to correct any errors from the earlier stages. We prove that the asymptotic required num… ▽ More In this paper, we consider the group testing problem with adaptive test designs and noisy outcomes. We propose a computationally efficient four-stage procedure with components including random binning, identification of bins containing defective items, 1-sparse recovery via channel codes, and a "clean-up" step to correct any errors from the earlier stages. We prove that the asymptotic required number of tests comes very close to the best known information-theoretic achievability bound (which is based on computationally intractable decoding), and approaches a capacity-based converse bound in the low-sparsity regime. △ Less

Submitted 7 November, 2019; originally announced November 2019.

Comments: ISIT 2019

arXiv:1909.07425 [pdf, other]

A Characteristic Function Approach to Deep Implicit Generative Modeling

Authors: Abdul Fatir Ansari, Jonathan Scarlett, Harold Soh

Abstract: Implicit Generative Models (IGMs) such as GANs have emerged as effective data-driven models for generating samples, particularly images. In this paper, we formulate the problem of learning an IGM as minimizing the expected distance between characteristic functions. Specifically, we minimize the distance between characteristic functions of the real and generated data distributions under a suitably-… ▽ More Implicit Generative Models (IGMs) such as GANs have emerged as effective data-driven models for generating samples, particularly images. In this paper, we formulate the problem of learning an IGM as minimizing the expected distance between characteristic functions. Specifically, we minimize the distance between characteristic functions of the real and generated data distributions under a suitably-chosen weighting distribution. This distance metric, which we term as the characteristic function distance (CFD), can be (approximately) computed with linear time-complexity in the number of samples, in contrast with the quadratic-time Maximum Mean Discrepancy (MMD). By replacing the discrepancy measure in the critic of a GAN with the CFD, we obtain a model that is simple to implement and stable to train. The proposed metric enjoys desirable theoretical properties including continuity and differentiability with respect to generator parameters, and continuity in the weak topology. We further propose a variation of the CFD in which the weighting distribution parameters are also optimized during training; this obviates the need for manual tuning, and leads to an improvement in test power relative to CFD. We demonstrate experimentally that our proposed method outperforms WGAN and MMD-GAN variants on a variety of unsupervised image generation benchmarks. △ Less

Submitted 16 June, 2020; v1 submitted 16 September, 2019; originally announced September 2019.

Comments: CVPR 2020 (Oral), Code available at https://github.com/clear-nus/OCFGAN

arXiv:1908.10744 [pdf, other]

Information-Theoretic Lower Bounds for Compressive Sensing with Generative Models

Authors: Zhaoqiang Liu, Jonathan Scarlett

Abstract: It has recently been shown that for compressive sensing, significantly fewer measurements may be required if the sparsity assumption is replaced by the assumption the unknown vector lies near the range of a suitably-chosen generative model. In particular, in (Bora {\em et al.}, 2017) it was shown roughly $O(k\log L)$ random Gaussian measurements suffice for accurate recovery when the generative mo… ▽ More It has recently been shown that for compressive sensing, significantly fewer measurements may be required if the sparsity assumption is replaced by the assumption the unknown vector lies near the range of a suitably-chosen generative model. In particular, in (Bora {\em et al.}, 2017) it was shown roughly $O(k\log L)$ random Gaussian measurements suffice for accurate recovery when the generative model is an $L$-Lipschitz function with bounded $k$-dimensional inputs, and $O(kd \log w)$ measurements suffice when the generative model is a $k$-input ReLU network with depth $d$ and width $w$. In this paper, we establish corresponding algorithm-independent lower bounds on the sample complexity using tools from minimax statistical analysis. In accordance with the above upper bounds, our results are summarized as follows: (i) We construct an $L$-Lipschitz generative model capable of generating group-sparse signals, and show that the resulting necessary number of measurements is $Ω(k \log L)$; (ii) Using similar ideas, we construct ReLU networks with high depth and/or high depth for which the necessary number of measurements scales as $Ω\big( kd \frac{\log w}{\log n}\big)$ (with output dimension $n$), and in some cases $Ω(kd \log w)$. As a result, we establish that the scaling laws derived in (Bora {\em et al.}, 2017) are optimal or near-optimal in the absence of further assumptions. △ Less

Submitted 10 March, 2020; v1 submitted 28 August, 2019; originally announced August 2019.

Comments: To appear in IEEE Journal on Selected Areas in Information Theory. In this version, Theorem 7 was updated to allow for general depth/width scalings

arXiv:1905.03410 [pdf, ps, other]

Learning Erdős-Rényi Random Graphs via Edge Detecting Queries

Authors: Zihan Li, Matthias Fresacher, Jonathan Scarlett

Abstract: In this paper, we consider the problem of learning an unknown graph via queries on groups of nodes, with the result indicating whether or not at least one edge is present among those nodes. While learning arbitrary graphs with $n$ nodes and $k$ edges is known to be hard in the sense of requiring $Ω( \min\{ k^2 \log n, n^2\})$ tests (even when a small probability of error is allowed), we show that… ▽ More In this paper, we consider the problem of learning an unknown graph via queries on groups of nodes, with the result indicating whether or not at least one edge is present among those nodes. While learning arbitrary graphs with $n$ nodes and $k$ edges is known to be hard in the sense of requiring $Ω( \min\{ k^2 \log n, n^2\})$ tests (even when a small probability of error is allowed), we show that learning an Erdős-Rényi random graph with an average of $\bar{k}$ edges is much easier; namely, one can attain asymptotically vanishing error probability with only $O(\bar{k}\log n)$ tests. We establish such bounds for a variety of algorithms inspired by the group testing problem, with explicit constant factors indicating a near-optimal number of tests, and in some cases asymptotic optimality including constant factors. In addition, we present an alternative design that permits a near-optimal sublinear decoding time of $O(\bar{k} \log^2 \bar{k} + \bar{k} \log n)$. △ Less

Submitted 3 January, 2020; v1 submitted 8 May, 2019; originally announced May 2019.

Comments: NeurIPS 2019

arXiv:1901.10647 [pdf, ps, other]

Support Recovery in the Phase Retrieval Model: Information-Theoretic Fundamental Limits

Authors: Lan V. Truong, Jonathan Scarlett

Abstract: The support recovery problem consists of determining a sparse subset of variables that is relevant in generating a set of observations. In this paper, we study the support recovery problem in the phase retrieval model consisting of noisy phaseless measurements, which arises in a diverse range of settings such as optical detection, X-ray crystallography, electron microscopy, and coherent diffractiv… ▽ More The support recovery problem consists of determining a sparse subset of variables that is relevant in generating a set of observations. In this paper, we study the support recovery problem in the phase retrieval model consisting of noisy phaseless measurements, which arises in a diverse range of settings such as optical detection, X-ray crystallography, electron microscopy, and coherent diffractive imaging. Our focus is on information-theoretic fundamental limits under an approximate recovery criterion, considering both discrete and Gaussian models for the sparse non-zero entries, along with Gaussian measurement matrices. In both cases, our bounds provide sharp thresholds with near-matching constant factors in several scaling regimes on the sparsity and signal-to-noise ratio. As a key step towards obtaining these results, we develop new concentration bounds for the conditional information content of log-concave random variables, which may be of independent interest. △ Less

Submitted 27 September, 2020; v1 submitted 29 January, 2019; originally announced January 2019.

Comments: Accepted to IEEE Transactions on Information Theory

arXiv:1901.00555 [pdf, other]

An Introductory Guide to Fano's Inequality with Applications in Statistical Estimation

Authors: Jonathan Scarlett, Volkan Cevher

Abstract: Information theory plays an indispensable role in the development of algorithm-independent impossibility results, both for communication problems and for seemingly distinct areas such as statistics and machine learning. While numerous information-theoretic tools have been proposed for this purpose, the oldest one remains arguably the most versatile and widespread: Fano's inequality. In this chapte… ▽ More Information theory plays an indispensable role in the development of algorithm-independent impossibility results, both for communication problems and for seemingly distinct areas such as statistics and machine learning. While numerous information-theoretic tools have been proposed for this purpose, the oldest one remains arguably the most versatile and widespread: Fano's inequality. In this chapter, we provide a survey of Fano's inequality and its variants in the context of statistical estimation, adopting a versatile framework that covers a wide range of specific problems. We present a variety of key tools and techniques used for establishing impossibility results via this approach, and provide representative examples covering group testing, graphical model selection, sparse linear regression, density estimation, and convex optimization. △ Less

Submitted 25 November, 2019; v1 submitted 2 January, 2019; originally announced January 2019.

Comments: Chapter in upcoming book "Information-Theoretic Methods in Data Science" (Cambridge University Press) edited by Yonina Eldar and Miguel Rodrigues. (v2 & v3) Minor corrections and edits

arXiv:1810.10775 [pdf, other]

Adversarially Robust Optimization with Gaussian Processes

Authors: Ilija Bogunovic, Jonathan Scarlett, Stefanie Jegelka, Volkan Cevher

Abstract: In this paper, we consider the problem of Gaussian process (GP) optimization with an added robustness requirement: The returned point may be perturbed by an adversary, and we require the function value to remain as high as possible even after this perturbation. This problem is motivated by settings in which the underlying functions during optimization and implementation stages are different, or wh… ▽ More In this paper, we consider the problem of Gaussian process (GP) optimization with an added robustness requirement: The returned point may be perturbed by an adversary, and we require the function value to remain as high as possible even after this perturbation. This problem is motivated by settings in which the underlying functions during optimization and implementation stages are different, or when one is interested in finding an entire region of good inputs rather than only a single point. We show that standard GP optimization algorithms do not exhibit the desired robustness properties, and provide a novel confidence-bound based algorithm StableOpt for this purpose. We rigorously establish the required number of samples for StableOpt to find a near-optimal point, and we complement this guarantee with an algorithm-independent lower bound. We experimentally demonstrate several potential applications of interest using real-world data sets, and we show that StableOpt consistently succeeds in finding a stable maximizer where several baseline methods fail. △ Less

Submitted 1 November, 2018; v1 submitted 25 October, 2018; originally announced October 2018.

Comments: Corrected typos

arXiv:1805.11792 [pdf, other]

Tight Regret Bounds for Bayesian Optimization in One Dimension

Authors: Jonathan Scarlett

Abstract: We consider the problem of Bayesian optimization (BO) in one dimension, under a Gaussian process prior and Gaussian sampling noise. We provide a theoretical analysis showing that, under fairly mild technical assumptions on the kernel, the best possible cumulative regret up to time $T$ behaves as $Ω(\sqrt{T})$ and $O(\sqrt{T\log T})$. This gives a tight characterization up to a $\sqrt{\log T}$ fact… ▽ More We consider the problem of Bayesian optimization (BO) in one dimension, under a Gaussian process prior and Gaussian sampling noise. We provide a theoretical analysis showing that, under fairly mild technical assumptions on the kernel, the best possible cumulative regret up to time $T$ behaves as $Ω(\sqrt{T})$ and $O(\sqrt{T\log T})$. This gives a tight characterization up to a $\sqrt{\log T}$ factor, and includes the first non-trivial lower bound for noisy BO. Our assumptions are satisfied, for example, by the squared exponential and Matérn-$ν$ kernels, with the latter requiring $ν> 2$. Our results certify the near-optimality of existing bounds (Srinivas {\em et al.}, 2009) for the SE kernel, while proving them to be strictly suboptimal for the Matérn kernel with $ν> 2$. △ Less

Submitted 16 December, 2019; v1 submitted 29 May, 2018; originally announced May 2018.

Comments: ICML 2018 + supplementary material

arXiv:1802.07028 [pdf, ps, other]

High-Dimensional Bayesian Optimization via Additive Models with Overlap** Groups

Authors: Paul Rolland, Jonathan Scarlett, Ilija Bogunovic, Volkan Cevher

Abstract: Bayesian optimization (BO) is a popular technique for sequential black-box function optimization, with applications including parameter tuning, robotics, environmental monitoring, and more. One of the most important challenges in BO is the development of algorithms that scale to high dimensions, which remains a key open problem despite recent progress. In this paper, we consider the approach of Ka… ▽ More Bayesian optimization (BO) is a popular technique for sequential black-box function optimization, with applications including parameter tuning, robotics, environmental monitoring, and more. One of the most important challenges in BO is the development of algorithms that scale to high dimensions, which remains a key open problem despite recent progress. In this paper, we consider the approach of Kandasamy et al. (2015), in which the high-dimensional function decomposes as a sum of lower-dimensional functions on subsets of the underlying variables. In particular, we significantly generalize this approach by lifting the assumption that the subsets are disjoint, and consider additive models with arbitrary overlap among the subsets. By representing the dependencies via a graph, we deduce an efficient message passing algorithm for optimizing the acquisition function. In addition, we provide an algorithm for learning the graph from samples based on Gibbs sampling. We empirically demonstrate the effectiveness of our methods on both synthetic and real-world data. △ Less

Submitted 28 March, 2018; v1 submitted 20 February, 2018; originally announced February 2018.

arXiv:1710.06766 [pdf, other]

Phase Transitions in the Pooled Data Problem

Authors: Jonathan Scarlett, Volkan Cevher

Abstract: In this paper, we study the pooled data problem of identifying the labels associated with a large collection of items, based on a sequence of pooled tests revealing the counts of each label within the pool. In the noiseless setting, we identify an exact asymptotic threshold on the required number of tests with optimal decoding, and prove a phase transition between complete success and complete fai… ▽ More In this paper, we study the pooled data problem of identifying the labels associated with a large collection of items, based on a sequence of pooled tests revealing the counts of each label within the pool. In the noiseless setting, we identify an exact asymptotic threshold on the required number of tests with optimal decoding, and prove a phase transition between complete success and complete failure. In addition, we present a novel noisy variation of the problem, and provide an information-theoretic framework for characterizing the required number of tests for general random noise models. Our results reveal that noise can make the problem considerably more difficult, with strict increases in the scaling laws even at low noise levels. Finally, we demonstrate similar behavior in an approximate recovery setting, where a given number of errors is allowed in the decoded labels. △ Less

Submitted 18 October, 2017; originally announced October 2017.

Comments: Accepted to NIPS 2017

arXiv:1706.04918 [pdf, other]

Robust Submodular Maximization: A Non-Uniform Partitioning Approach

Authors: Ilija Bogunovic, Slobodan Mitrović, Jonathan Scarlett, Volkan Cevher

Abstract: We study the problem of maximizing a monotone submodular function subject to a cardinality constraint $k$, with the added twist that a number of items $τ$ from the returned set may be removed. We focus on the worst-case setting considered in (Orlin et al., 2016), in which a constant-factor approximation guarantee was given for $τ= o(\sqrt{k})$. In this paper, we solve a key open problem raised the… ▽ More We study the problem of maximizing a monotone submodular function subject to a cardinality constraint $k$, with the added twist that a number of items $τ$ from the returned set may be removed. We focus on the worst-case setting considered in (Orlin et al., 2016), in which a constant-factor approximation guarantee was given for $τ= o(\sqrt{k})$. In this paper, we solve a key open problem raised therein, presenting a new Partitioned Robust (PRo) submodular maximization algorithm that achieves the same guarantee for more general $τ= o(k)$. Our algorithm constructs partitions consisting of buckets with exponentially increasing sizes, and applies standard submodular optimization subroutines on the buckets in order to construct the robust solution. We numerically demonstrate the performance of PRo in data summarization and influence maximization, demonstrating gains over both the greedy algorithm and the algorithm of (Orlin et al., 2016). △ Less

Submitted 15 June, 2017; originally announced June 2017.

Comments: Accepted to ICML 2017

arXiv:1706.00090 [pdf, other]

Lower Bounds on Regret for Noisy Gaussian Process Bandit Optimization

Authors: Jonathan Scarlett, Ilijia Bogunovic, Volkan Cevher

Abstract: In this paper, we consider the problem of sequentially optimizing a black-box function $f$ based on noisy samples and bandit feedback. We assume that $f$ is smooth in the sense of having a bounded norm in some reproducing kernel Hilbert space (RKHS), yielding a commonly-considered non-Bayesian form of Gaussian process bandit optimization. We provide algorithm-independent lower bounds on the simple… ▽ More In this paper, we consider the problem of sequentially optimizing a black-box function $f$ based on noisy samples and bandit feedback. We assume that $f$ is smooth in the sense of having a bounded norm in some reproducing kernel Hilbert space (RKHS), yielding a commonly-considered non-Bayesian form of Gaussian process bandit optimization. We provide algorithm-independent lower bounds on the simple regret, measuring the suboptimality of a single point reported after $T$ rounds, and on the cumulative regret, measuring the sum of regrets over the $T$ chosen points. For the isotropic squared-exponential kernel in $d$ dimensions, we find that an average simple regret of $ε$ requires $T = Ω\big(\frac{1}{ε^2} (\log\frac{1}ε)^{d/2}\big)$, and the average cumulative regret is at least $Ω\big( \sqrt{T(\log T)^{d/2}} \big)$, thus matching existing upper bounds up to the replacement of $d/2$ by $2d+O(1)$ in both cases. For the Matérn-$ν$ kernel, we give analogous bounds of the form $Ω\big( (\frac{1}ε)^{2+d/ν}\big)$ and $Ω\big( T^{\frac{ν+ d}{2ν+ d}} \big)$, and discuss the resulting gaps to the existing upper bounds. △ Less

Submitted 31 May, 2018; v1 submitted 31 May, 2017; originally announced June 2017.

Comments: Appearing in COLT 2017. This version corrects a few minor mistakes in Table I, which summarizes the new and existing regret bounds

arXiv:1610.07379 [pdf, other]

Truncated Variance Reduction: A Unified Approach to Bayesian Optimization and Level-Set Estimation

Authors: Ilija Bogunovic, Jonathan Scarlett, Andreas Krause, Volkan Cevher

Abstract: We present a new algorithm, truncated variance reduction (TruVaR), that treats Bayesian optimization (BO) and level-set estimation (LSE) with Gaussian processes in a unified fashion. The algorithm greedily shrinks a sum of truncated variances within a set of potential maximizers (BO) or unclassified points (LSE), which is updated based on confidence bounds. TruVaR is effective in several important… ▽ More We present a new algorithm, truncated variance reduction (TruVaR), that treats Bayesian optimization (BO) and level-set estimation (LSE) with Gaussian processes in a unified fashion. The algorithm greedily shrinks a sum of truncated variances within a set of potential maximizers (BO) or unclassified points (LSE), which is updated based on confidence bounds. TruVaR is effective in several important settings that are typically non-trivial to incorporate into myopic algorithms, including pointwise costs and heteroscedastic noise. We provide a general theoretical guarantee for TruVaR covering these aspects, and use it to recover and strengthen existing results on BO and LSE. Moreover, we provide a new result for a setting where one can select from a number of noise levels having associated costs. We demonstrate the effectiveness of the algorithm on both synthetic and real-world data sets. △ Less

Submitted 24 October, 2016; originally announced October 2016.

Comments: Accepted to NIPS 2016

arXiv:1607.02413 [pdf, other]

Lower Bounds on Active Learning for Graphical Model Selection

Authors: Jonathan Scarlett, Volkan Cevher

Abstract: We consider the problem of estimating the underlying graph associated with a Markov random field, with the added twist that the decoding algorithm can iteratively choose which subsets of nodes to sample based on the previous samples, resulting in an active learning setting. Considering both Ising and Gaussian models, we provide algorithm-independent lower bounds for high-probability recovery withi… ▽ More We consider the problem of estimating the underlying graph associated with a Markov random field, with the added twist that the decoding algorithm can iteratively choose which subsets of nodes to sample based on the previous samples, resulting in an active learning setting. Considering both Ising and Gaussian models, we provide algorithm-independent lower bounds for high-probability recovery within the class of degree-bounded graphs. Our main results are minimax lower bounds for the active setting that match the best known lower bounds for the passive setting, which in turn are known to be tight in several cases of interest. Our analysis is based on Fano's inequality, along with novel mutual information bounds for the active learning setting, and the application of restricted graph ensembles. While we consider ensembles that are similar or identical to those used in the passive setting, we require different analysis techniques, with a key challenge being bounding a mutual information quantity associated with observed subsets of nodes, as opposed to full observations. △ Less

Submitted 7 February, 2017; v1 submitted 8 July, 2016; originally announced July 2016.

Comments: Accepted to AISTATS 2017

arXiv:1602.03647 [pdf, other]

On the Difficulty of Selecting Ising Models with Approximate Recovery

Authors: Jonathan Scarlett, Volkan Cevher

Abstract: In this paper, we consider the problem of estimating the underlying graph associated with an Ising model given a number of independent and identically distributed samples. We adopt an \emph{approximate recovery} criterion that allows for a number of missed edges or incorrectly-included edges, in contrast with the widely-studied exact recovery problem. Our main results provide information-theoretic… ▽ More In this paper, we consider the problem of estimating the underlying graph associated with an Ising model given a number of independent and identically distributed samples. We adopt an \emph{approximate recovery} criterion that allows for a number of missed edges or incorrectly-included edges, in contrast with the widely-studied exact recovery problem. Our main results provide information-theoretic lower bounds on the sample complexity for graph classes imposing constraints on the number of edges, maximal degree, and other properties. We identify a broad range of scenarios where, either up to constant factors or logarithmic factors, our lower bounds match the best known lower bounds for the exact recovery criterion, several of which are known to be tight or near-tight. Hence, in these cases, approximate recovery has a similar difficulty to exact recovery in the minimax sense. Our bounds are obtained via a modification of Fano's inequality for handling the approximate recovery criterion, along with suitably-designed ensembles of graphs that can broadly be classed into two categories: (i) Those containing graphs that contain several isolated edges or cliques and are thus difficult to distinguish from the empty graph; (ii) Those containing graphs for which certain groups of nodes are highly correlated, thus making it difficult to determine precisely which edges connect them. We support our theoretical results on these ensembles with numerical experiments. △ Less

Submitted 8 July, 2016; v1 submitted 11 February, 2016; originally announced February 2016.

Comments: Accepted to IEEE Transactions on Signal and Information Processing over Networks (TSIPN)

arXiv:1602.00877 [pdf, other]

Partial Recovery Bounds for the Sparse Stochastic Block Model

Authors: Jonathan Scarlett, Volkan Cevher

Abstract: In this paper, we study the information-theoretic limits of community detection in the symmetric two-community stochastic block model, with intra-community and inter-community edge probabilities $\frac{a}{n}$ and $\frac{b}{n}$ respectively. We consider the sparse setting, in which $a$ and $b$ do not scale with $n$, and provide upper and lower bounds on the proportion of community labels recovered… ▽ More In this paper, we study the information-theoretic limits of community detection in the symmetric two-community stochastic block model, with intra-community and inter-community edge probabilities $\frac{a}{n}$ and $\frac{b}{n}$ respectively. We consider the sparse setting, in which $a$ and $b$ do not scale with $n$, and provide upper and lower bounds on the proportion of community labels recovered on average. We provide a numerical example for which the bounds are near-matching for moderate values of $a - b$, and matching in the limit as $a-b$ grows large. △ Less

Submitted 4 April, 2016; v1 submitted 2 February, 2016; originally announced February 2016.

Comments: Accepted to ISIT 2016

arXiv:1601.06650 [pdf, other]

Time-Varying Gaussian Process Bandit Optimization

Authors: Ilija Bogunovic, Jonathan Scarlett, Volkan Cevher

Abstract: We consider the sequential Bayesian optimization problem with bandit feedback, adopting a formulation that allows for the reward function to vary with time. We model the reward function using a Gaussian process whose evolution obeys a simple Markov model. We introduce two natural extensions of the classical Gaussian process upper confidence bound (GP-UCB) algorithm. The first, R-GP-UCB, resets GP-… ▽ More We consider the sequential Bayesian optimization problem with bandit feedback, adopting a formulation that allows for the reward function to vary with time. We model the reward function using a Gaussian process whose evolution obeys a simple Markov model. We introduce two natural extensions of the classical Gaussian process upper confidence bound (GP-UCB) algorithm. The first, R-GP-UCB, resets GP-UCB at regular intervals. The second, TV-GP-UCB, instead forgets about old data in a smooth fashion. Our main contribution comprises of novel regret bounds for these algorithms, providing an explicit characterization of the trade-off between the time horizon and the rate at which the function varies. We illustrate the performance of the algorithms on both synthetic and real data, and we find the gradual forgetting of TV-GP-UCB to perform favorably compared to the sharp resetting of R-GP-UCB. Moreover, both algorithms significantly outperform classical GP-UCB, since it treats stale and fresh data equally. △ Less

Submitted 25 January, 2016; originally announced January 2016.

Comments: To appear in AISTATS 2016

arXiv:1510.06188 [pdf, other]

doi 10.1109/JSTSP.2016.2548442

Learning-based Compressive Subsampling

Authors: Luca Baldassarre, Yen-Huan Li, Jonathan Scarlett, Baran Gözcü, Ilija Bogunovic, Volkan Cevher

Abstract: The problem of recovering a structured signal $\mathbf{x} \in \mathbb{C}^p$ from a set of dimensionality-reduced linear measurements $\mathbf{b} = \mathbf {A}\mathbf {x}$ arises in a variety of applications, such as medical imaging, spectroscopy, Fourier optics, and computerized tomography. Due to computational and storage complexity or physical constraints imposed by the problem, the measurement… ▽ More The problem of recovering a structured signal $\mathbf{x} \in \mathbb{C}^p$ from a set of dimensionality-reduced linear measurements $\mathbf{b} = \mathbf {A}\mathbf {x}$ arises in a variety of applications, such as medical imaging, spectroscopy, Fourier optics, and computerized tomography. Due to computational and storage complexity or physical constraints imposed by the problem, the measurement matrix $\mathbf{A} \in \mathbb{C}^{n \times p}$ is often of the form $\mathbf{A} = \mathbf{P}_Ω\boldsymbolΨ$ for some orthonormal basis matrix $\boldsymbolΨ\in \mathbb{C}^{p \times p}$ and subsampling operator $\mathbf{P}_Ω: \mathbb{C}^{p} \rightarrow \mathbb{C}^{n}$ that selects the rows indexed by $Ω$. This raises the fundamental question of how best to choose the index set $Ω$ in order to optimize the recovery performance. Previous approaches to addressing this question rely on non-uniform \emph{random} subsampling using application-specific knowledge of the structure of $\mathbf{x}$. In this paper, we instead take a principled learning-based approach in which a \emph{fixed} index set is chosen based on a set of training signals $\mathbf{x}_1,\dotsc,\mathbf{x}_m$. We formulate combinatorial optimization problems seeking to maximize the energy captured in these signals in an average-case or worst-case sense, and we show that these can be efficiently solved either exactly or approximately via the identification of modularity and submodularity structures. We provide both deterministic and statistical theoretical guarantees showing how the resulting measurement matrices perform on signals differing from the training signals, and we provide numerical examples showing our approach to be effective on a variety of data sets. △ Less

Submitted 28 March, 2016; v1 submitted 21 October, 2015; originally announced October 2015.

Comments: Submitted to IEEE Journal on Selected Topics in Signal Processing

arXiv:1501.07440 [pdf, other]

Limits on Support Recovery with Probabilistic Models: An Information-Theoretic Framework

Authors: Jonathan Scarlett, Volkan Cevher

Abstract: The support recovery problem consists of determining a sparse subset of a set of variables that is relevant in generating a set of observations, and arises in a diverse range of settings such as compressive sensing, and subset selection in regression, and group testing. In this paper, we take a unified approach to support recovery problems, considering general probabilistic models relating a spars… ▽ More The support recovery problem consists of determining a sparse subset of a set of variables that is relevant in generating a set of observations, and arises in a diverse range of settings such as compressive sensing, and subset selection in regression, and group testing. In this paper, we take a unified approach to support recovery problems, considering general probabilistic models relating a sparse data vector to an observation vector. We study the information-theoretic limits of both exact and partial support recovery, taking a novel approach motivated by thresholding techniques in channel coding. We provide general achievability and converse bounds characterizing the trade-off between the error probability and number of measurements, and we specialize these to the linear, 1-bit, and group testing models. In several cases, our bounds not only provide matching scaling laws in the necessary and sufficient number of measurements, but also sharp thresholds with matching constant factors. Our approach has several advantages over previous approaches: For the achievability part, we obtain sharp thresholds under broader scalings of the sparsity level and other parameters (e.g., signal-to-noise ratio) compared to several previous works, and for the converse part, we not only provide conditions under which the error probability fails to vanish, but also conditions under which it tends to one. △ Less

Submitted 30 August, 2016; v1 submitted 29 January, 2015; originally announced January 2015.

Comments: Accepted to IEEE Transactions on Information Theory; presented in part at ISIT 2015 and SODA 2016

Showing 1–46 of 46 results for author: Scarlett, J