Search | arXiv e-print repository

Stop** Bayesian Optimization with Probabilistic Regret Bounds

Abstract: Bayesian optimization is a popular framework for efficiently finding high-quality solutions to difficult problems based on limited prior information. As a rule, these algorithms operate by iteratively choosing what to try next until some predefined budget has been exhausted. We investigate replacing this de facto stop** rule with an $(ε, δ)$-criterion: stop when a solution has been found whose v… ▽ More Bayesian optimization is a popular framework for efficiently finding high-quality solutions to difficult problems based on limited prior information. As a rule, these algorithms operate by iteratively choosing what to try next until some predefined budget has been exhausted. We investigate replacing this de facto stop** rule with an $(ε, δ)$-criterion: stop when a solution has been found whose value is within $ε> 0$ of the optimum with probability at least $1 - δ$ under the model. Given access to the prior distribution of problems, we show how to verify this condition in practice using a limited number of draws from the posterior. For Gaussian process priors, we prove that Bayesian optimization with the proposed criterion stops in finite time and returns a point that satisfies the $(ε, δ)$-criterion under mild assumptions. These findings are accompanied by extensive empirical results which demonstrate the strengths and weaknesses of this approach. △ Less

Submitted 26 February, 2024; originally announced February 2024.

Comments: Submitted to ICML 2024

arXiv:2309.00854 [pdf, other]

A Unifying Variational Framework for Gaussian Process Motion Planning

Authors: Lucas Cosier, Rares Iordan, Sicelukwanda Zwane, Giovanni Franzese, James T. Wilson, Marc Peter Deisenroth, Alexander Terenin, Yasemin Bekiroglu

Abstract: To control how a robot moves, motion planning algorithms must compute paths in high-dimensional state spaces while accounting for physical constraints related to motors and joints, generating smooth and stable motions, avoiding obstacles, and preventing collisions. A motion planning algorithm must therefore balance competing demands, and should ideally incorporate uncertainty to handle noise, mode… ▽ More To control how a robot moves, motion planning algorithms must compute paths in high-dimensional state spaces while accounting for physical constraints related to motors and joints, generating smooth and stable motions, avoiding obstacles, and preventing collisions. A motion planning algorithm must therefore balance competing demands, and should ideally incorporate uncertainty to handle noise, model errors, and facilitate deployment in complex environments. To address these issues, we introduce a framework for robot motion planning based on variational Gaussian processes, which unifies and generalizes various probabilistic-inference-based motion planning algorithms, and connects them with optimization-based planners. Our framework provides a principled and flexible way to incorporate equality-based, inequality-based, and soft motion-planning constraints during end-to-end training, is straightforward to implement, and provides both interval-based and Monte-Carlo-based uncertainty estimates. We conduct experiments using different environments and robots, comparing against baseline approaches based on the feasibility of the planned paths, and obstacle avoidance quality. Results show that our proposed approach yields a good balance between success rates and path quality. △ Less

Submitted 8 March, 2024; v1 submitted 2 September, 2023; originally announced September 2023.

Comments: Code and supplementary video available at: https://github.com/luke-ck/vgpmp

Journal ref: Artificial Intelligence and Statistics, 2024

arXiv:2011.04026 [pdf, other]

Pathwise Conditioning of Gaussian Processes

Authors: James T. Wilson, Viacheslav Borovitskiy, Alexander Terenin, Peter Mostowsky, Marc Peter Deisenroth

Abstract: As Gaussian processes are used to answer increasingly complex questions, analytic solutions become scarcer and scarcer. Monte Carlo methods act as a convenient bridge for connecting intractable mathematical expressions with actionable estimates via sampling. Conventional approaches for simulating Gaussian process posteriors view samples as draws from marginal distributions of process values at fin… ▽ More As Gaussian processes are used to answer increasingly complex questions, analytic solutions become scarcer and scarcer. Monte Carlo methods act as a convenient bridge for connecting intractable mathematical expressions with actionable estimates via sampling. Conventional approaches for simulating Gaussian process posteriors view samples as draws from marginal distributions of process values at finite sets of input locations. This distribution-centric characterization leads to generative strategies that scale cubically in the size of the desired random vector. These methods are prohibitively expensive in cases where we would, ideally, like to draw high-dimensional vectors or even continuous sample paths. In this work, we investigate a different line of reasoning: rather than focusing on distributions, we articulate Gaussian conditionals at the level of random variables. We show how this pathwise interpretation of conditioning gives rise to a general family of approximations that lend themselves to efficiently sampling Gaussian process posteriors. Starting from first principles, we derive these methods and analyze the approximation errors they introduce. We, then, ground these results by exploring the practical implications of pathwise conditioning in various applied settings, such as global optimization and reinforcement learning. △ Less

Submitted 30 July, 2021; v1 submitted 8 November, 2020; originally announced November 2020.

Journal ref: Journal of Machine Learning Research, 22(105):1-47, 2021

arXiv:2002.09309 [pdf, other]

Efficiently Sampling Functions from Gaussian Process Posteriors

Authors: James T. Wilson, Viacheslav Borovitskiy, Alexander Terenin, Peter Mostowsky, Marc Peter Deisenroth

Abstract: Gaussian processes are the gold standard for many real-world modeling problems, especially in cases where a model's success hinges upon its ability to faithfully represent predictive uncertainty. These problems typically exist as parts of larger frameworks, wherein quantities of interest are ultimately defined by integrating over posterior distributions. These quantities are frequently intractable… ▽ More Gaussian processes are the gold standard for many real-world modeling problems, especially in cases where a model's success hinges upon its ability to faithfully represent predictive uncertainty. These problems typically exist as parts of larger frameworks, wherein quantities of interest are ultimately defined by integrating over posterior distributions. These quantities are frequently intractable, motivating the use of Monte Carlo methods. Despite substantial progress in scaling up Gaussian processes to large training sets, methods for accurately generating draws from their posterior distributions still scale cubically in the number of test locations. We identify a decomposition of Gaussian processes that naturally lends itself to scalable sampling by separating out the prior from the data. Building off of this factorization, we propose an easy-to-use and general-purpose approach for fast posterior sampling, which seamlessly pairs with sparse approximations to afford scalability both during training and at test time. In a series of experiments designed to test competing sampling schemes' statistical properties and practical ramifications, we demonstrate how decoupled sample paths accurately represent Gaussian process posteriors at a fraction of the usual cost. △ Less

Submitted 16 August, 2020; v1 submitted 21 February, 2020; originally announced February 2020.

Journal ref: International Conference on Machine Learning, 2020

arXiv:1805.10196 [pdf, other]

Maximizing acquisition functions for Bayesian optimization

Authors: James T. Wilson, Frank Hutter, Marc Peter Deisenroth

Abstract: Bayesian optimization is a sample-efficient approach to global optimization that relies on theoretically motivated value heuristics (acquisition functions) to guide its search process. Fully maximizing acquisition functions produces the Bayes' decision rule, but this ideal is difficult to achieve since these functions are frequently non-trivial to optimize. This statement is especially true when e… ▽ More Bayesian optimization is a sample-efficient approach to global optimization that relies on theoretically motivated value heuristics (acquisition functions) to guide its search process. Fully maximizing acquisition functions produces the Bayes' decision rule, but this ideal is difficult to achieve since these functions are frequently non-trivial to optimize. This statement is especially true when evaluating queries in parallel, where acquisition functions are routinely non-convex, high-dimensional, and intractable. We first show that acquisition functions estimated via Monte Carlo integration are consistently amenable to gradient-based optimization. Subsequently, we identify a common family of acquisition functions, including EI and UCB, whose properties not only facilitate but justify use of greedy approaches for their maximization. △ Less

Submitted 2 December, 2018; v1 submitted 25 May, 2018; originally announced May 2018.

Comments: Proceedings of the Thirty-second Conference on Neural Information Processing Systems, 2018

arXiv:1712.00424 [pdf, other]

The reparameterization trick for acquisition functions

Authors: James T. Wilson, Riccardo Moriconi, Frank Hutter, Marc Peter Deisenroth

Abstract: Bayesian optimization is a sample-efficient approach to solving global optimization problems. Along with a surrogate model, this approach relies on theoretically motivated value heuristics (acquisition functions) to guide the search process. Maximizing acquisition functions yields the best performance; unfortunately, this ideal is difficult to achieve since optimizing acquisition functions per se… ▽ More Bayesian optimization is a sample-efficient approach to solving global optimization problems. Along with a surrogate model, this approach relies on theoretically motivated value heuristics (acquisition functions) to guide the search process. Maximizing acquisition functions yields the best performance; unfortunately, this ideal is difficult to achieve since optimizing acquisition functions per se is frequently non-trivial. This statement is especially true in the parallel setting, where acquisition functions are routinely non-convex, high-dimensional, and intractable. Here, we demonstrate how many popular acquisition functions can be formulated as Gaussian integrals amenable to the reparameterization trick and, ensuingly, gradient-based optimization. Further, we use this reparameterized representation to derive an efficient Monte Carlo estimator for the upper confidence bound acquisition function in the context of parallel selection. △ Less

Submitted 1 December, 2017; originally announced December 2017.

Comments: Accepted at the NIPS 2017 Workshop on Bayesian Optimization (BayesOpt 2017)

arXiv:1506.04449 [pdf, other]

Compressing Convolutional Neural Networks

Authors: Wenlin Chen, James T. Wilson, Stephen Tyree, Kilian Q. Weinberger, Yixin Chen

Abstract: Convolutional neural networks (CNN) are increasingly used in many areas of computer vision. They are particularly attractive because of their ability to "absorb" great quantities of labeled data through millions of parameters. However, as model sizes increase, so do the storage and memory requirements of the classifiers. We present a novel network architecture, Frequency-Sensitive Hashed Nets (Fre… ▽ More Convolutional neural networks (CNN) are increasingly used in many areas of computer vision. They are particularly attractive because of their ability to "absorb" great quantities of labeled data through millions of parameters. However, as model sizes increase, so do the storage and memory requirements of the classifiers. We present a novel network architecture, Frequency-Sensitive Hashed Nets (FreshNets), which exploits inherent redundancy in both convolutional layers and fully-connected layers of a deep learning model, leading to dramatic savings in memory and storage consumption. Based on the key observation that the weights of learned convolutional filters are typically smooth and low-frequency, we first convert filter weights to the frequency domain with a discrete cosine transform (DCT) and use a low-cost hash function to randomly group frequency parameters into hash buckets. All parameters assigned the same hash bucket share a single value learned with standard back-propagation. To further reduce model size we allocate fewer hash buckets to high-frequency components, which are generally less important. We evaluate FreshNets on eight data sets, and show that it leads to drastically better compressed performance than several relevant baselines. △ Less

Submitted 14 June, 2015; originally announced June 2015.

arXiv:1504.04788 [pdf, other]

Compressing Neural Networks with the Hashing Trick

Authors: Wenlin Chen, James T. Wilson, Stephen Tyree, Kilian Q. Weinberger, Yixin Chen

Abstract: As deep nets are increasingly used in applications suited for mobile devices, a fundamental dilemma becomes apparent: the trend in deep learning is to grow models to absorb ever-increasing data set sizes; however mobile devices are designed with very little memory and cannot store such large models. We present a novel network architecture, HashedNets, that exploits inherent redundancy in neural ne… ▽ More As deep nets are increasingly used in applications suited for mobile devices, a fundamental dilemma becomes apparent: the trend in deep learning is to grow models to absorb ever-increasing data set sizes; however mobile devices are designed with very little memory and cannot store such large models. We present a novel network architecture, HashedNets, that exploits inherent redundancy in neural networks to achieve drastic reductions in model sizes. HashedNets uses a low-cost hash function to randomly group connection weights into hash buckets, and all connections within the same hash bucket share a single parameter value. These parameters are tuned to adjust to the HashedNets weight sharing architecture with standard backprop during training. Our hashing procedure introduces no additional memory overhead, and we demonstrate on several benchmark data sets that HashedNets shrink the storage requirements of neural networks substantially while mostly preserving generalization performance. △ Less

Submitted 19 April, 2015; originally announced April 2015.

Showing 1–8 of 8 results for author: Wilson, J T