-
Open Source Vizier: Distributed Infrastructure and API for Reliable and Flexible Blackbox Optimization
Authors:
Xingyou Song,
Sagi Perel,
Chansoo Lee,
Greg Kochanski,
Daniel Golovin
Abstract:
Vizier is the de-facto blackbox and hyperparameter optimization service across Google, having optimized some of Google's largest products and research efforts. To operate at the scale of tuning thousands of users' critical systems, Google Vizier solved key design challenges in providing multiple different features, while remaining fully fault-tolerant. In this paper, we introduce Open Source (OSS)…
▽ More
Vizier is the de-facto blackbox and hyperparameter optimization service across Google, having optimized some of Google's largest products and research efforts. To operate at the scale of tuning thousands of users' critical systems, Google Vizier solved key design challenges in providing multiple different features, while remaining fully fault-tolerant. In this paper, we introduce Open Source (OSS) Vizier, a standalone Python-based interface for blackbox optimization and research, based on the Google-internal Vizier infrastructure and framework. OSS Vizier provides an API capable of defining and solving a wide variety of optimization problems, including multi-metric, early stop**, transfer learning, and conditional search. Furthermore, it is designed to be a distributed system that assures reliability, and allows multiple parallel evaluations of the user's objective function. The flexible RPC-based infrastructure allows users to access OSS Vizier from binaries written in any language. OSS Vizier also provides a back-end ("Pythia") API that gives algorithm authors a way to interface new algorithms with the core OSS Vizier system. OSS Vizier is available at https://github.com/google/vizier.
△ Less
Submitted 10 January, 2023; v1 submitted 27 July, 2022;
originally announced July 2022.
-
Towards Learning Universal Hyperparameter Optimizers with Transformers
Authors:
Yutian Chen,
Xingyou Song,
Chansoo Lee,
Zi Wang,
Qiuyi Zhang,
David Dohan,
Kazuya Kawakami,
Greg Kochanski,
Arnaud Doucet,
Marc'aurelio Ranzato,
Sagi Perel,
Nando de Freitas
Abstract:
Meta-learning hyperparameter optimization (HPO) algorithms from prior experiments is a promising approach to improve optimization efficiency over objective functions from a similar distribution. However, existing methods are restricted to learning from experiments sharing the same set of hyperparameters. In this paper, we introduce the OptFormer, the first text-based Transformer HPO framework that…
▽ More
Meta-learning hyperparameter optimization (HPO) algorithms from prior experiments is a promising approach to improve optimization efficiency over objective functions from a similar distribution. However, existing methods are restricted to learning from experiments sharing the same set of hyperparameters. In this paper, we introduce the OptFormer, the first text-based Transformer HPO framework that provides a universal end-to-end interface for jointly learning policy and function prediction when trained on vast tuning data from the wild, such as Google's Vizier database, one of the world's largest HPO datasets. Our extensive experiments demonstrate that the OptFormer can simultaneously imitate at least 7 different HPO algorithms, which can be further improved via its function uncertainty estimates. Compared to a Gaussian Process, the OptFormer also learns a robust prior distribution for hyperparameter response functions, and can thereby provide more accurate and better calibrated predictions. This work paves the path to future extensions for training a Transformer-based model as a general HPO optimizer.
△ Less
Submitted 13 October, 2022; v1 submitted 26 May, 2022;
originally announced May 2022.
-
Gradientless Descent: High-Dimensional Zeroth-Order Optimization
Authors:
Daniel Golovin,
John Karro,
Greg Kochanski,
Chansoo Lee,
Xingyou Song,
Qiuyi Zhang
Abstract:
Zeroth-order optimization is the process of minimizing an objective $f(x)$, given oracle access to evaluations at adaptively chosen inputs $x$. In this paper, we present two simple yet powerful GradientLess Descent (GLD) algorithms that do not rely on an underlying gradient estimate and are numerically stable. We analyze our algorithm from a novel geometric perspective and present a novel analysis…
▽ More
Zeroth-order optimization is the process of minimizing an objective $f(x)$, given oracle access to evaluations at adaptively chosen inputs $x$. In this paper, we present two simple yet powerful GradientLess Descent (GLD) algorithms that do not rely on an underlying gradient estimate and are numerically stable. We analyze our algorithm from a novel geometric perspective and present a novel analysis that shows convergence within an $ε$-ball of the optimum in $O(kQ\log(n)\log(R/ε))$ evaluations, for any monotone transform of a smooth and strongly convex objective with latent dimension $k < n$, where the input dimension is $n$, $R$ is the diameter of the input space and $Q$ is the condition number. Our rates are the first of its kind to be both 1) poly-logarithmically dependent on dimensionality and 2) invariant under monotone transformations. We further leverage our geometric perspective to show that our analysis is optimal. Both monotone invariance and its ability to utilize a low latent dimensionality are key to the empirical success of our algorithms, as demonstrated on BBOB and MuJoCo benchmarks.
△ Less
Submitted 18 May, 2020; v1 submitted 14 November, 2019;
originally announced November 2019.
-
Categorical Judgment with a Variable Decision Rule
Authors:
Burton Rosner,
Greg Kochanski
Abstract:
A new Thurstonian rating scale model uses a variable decision rule (VDR) that incorporates three previously formulated, distinct decision rules. The model includes probabilities for choosing each rule, along with Gaussian representation and criterion densities. Numerical optimisation techniques were validated through demonstrating that the model fits simulated data tightly. For simulations with 40…
▽ More
A new Thurstonian rating scale model uses a variable decision rule (VDR) that incorporates three previously formulated, distinct decision rules. The model includes probabilities for choosing each rule, along with Gaussian representation and criterion densities. Numerical optimisation techniques were validated through demonstrating that the model fits simulated data tightly. For simulations with 400 trials per stimulus (tps), useful information emerged about the generating parameters. However, larger experiments (e.g. 4000 tps) proved desirable for better recovery of generating parameters and to support trustworthy choices between competing models by the Akaike Information Criterion. In reanalyses of experiments by others, the VDR model explained most of the data better than did classical signal detection theory models.
△ Less
Submitted 13 December, 2015;
originally announced January 2016.
-
Using Mimicry to Learn about Mental Representations
Authors:
Greg Kochanski
Abstract:
Phonology typically describes speech in terms of discrete signs like features. The field of intonational phonology uses discrete accents to describe intonation and prosody. But, are such representations useful? The results of mimicry experiments indicate that discrete signs are not a useful representation of the shape of intonation contours. Human behaviour seems to be better represented by a attr…
▽ More
Phonology typically describes speech in terms of discrete signs like features. The field of intonational phonology uses discrete accents to describe intonation and prosody. But, are such representations useful? The results of mimicry experiments indicate that discrete signs are not a useful representation of the shape of intonation contours. Human behaviour seems to be better represented by a attractors where memory retains substantial fine detail about an utterance. There is no evidence that discrete abstract representations that might be formed that have an effect on the speech that is subsequently produced. This paper also discusses conditions under which a discrete phonology can arise from an attractor model and why - for intonation - attractors can be inferred without the implying a discrete phonology.
△ Less
Submitted 15 April, 2012;
originally announced April 2012.
-
Detecting gross alignment errors in the Spoken British National Corpus
Authors:
Ladan Baghai-Ravary,
Sergio Grau,
Greg Kochanski
Abstract:
The paper presents methods for evaluating the accuracy of alignments between transcriptions and audio recordings. The methods have been applied to the Spoken British National Corpus, which is an extensive and varied corpus of natural unscripted speech. Early results show good agreement with human ratings of alignment accuracy. The methods also provide an indication of the location of likely alignm…
▽ More
The paper presents methods for evaluating the accuracy of alignments between transcriptions and audio recordings. The methods have been applied to the Spoken British National Corpus, which is an extensive and varied corpus of natural unscripted speech. Early results show good agreement with human ratings of alignment accuracy. The methods also provide an indication of the location of likely alignment problems; this should allow efficient manual examination of large corpora. Automatic checking of such alignments is crucial when analysing any very large corpus, since even the best current speech alignment systems will occasionally make serious errors. The methods described here use a hybrid approach based on statistics of the speech signal itself, statistics of the labels being evaluated, and statistics linking the two.
△ Less
Submitted 9 January, 2011;
originally announced January 2011.
-
Should Corpora be Big, Rich, or Dense?
Authors:
Greg P. Kochanski,
Chilin Shih,
Ryan Shosted
Abstract:
In this paper, we ask what properties makes a large corpus more or less useful. We suggest that size, by itself, should not be the ultimate goal of building a corpus. Large-scale corpora are considered desirable because they offer statistical stability and rich variation. But this rich variation means more factors to control and evaluate, which can limit the advantages of size. We discuss the use…
▽ More
In this paper, we ask what properties makes a large corpus more or less useful. We suggest that size, by itself, should not be the ultimate goal of building a corpus. Large-scale corpora are considered desirable because they offer statistical stability and rich variation. But this rich variation means more factors to control and evaluate, which can limit the advantages of size. We discuss the use of multi-channel data to complement large-scale speech corpora. Even though multi-channel data may limit the scale of a corpus (due to the complex and labor-intensive nature of data collection) they can offer information that allows us to tease apart various factors related to speech production.
△ Less
Submitted 13 December, 2010;
originally announced December 2010.
-
Bootstrap Markov chain Monte Carlo and optimal solutions for the Law of Categorical Judgment (Corrected)
Authors:
Greg Kochanski,
Burton S. Rosner
Abstract:
A novel procedure is described for accelerating the convergence of Markov chain Monte Carlo computations. The algorithm uses an adaptive bootstrap technique to generate candidate steps in the Markov Chain. It is efficient for symmetric, convex probability distributions, similar to multivariate Gaussians, and it can be used for Bayesian estimation or for obtaining maximum likelihood solutions with…
▽ More
A novel procedure is described for accelerating the convergence of Markov chain Monte Carlo computations. The algorithm uses an adaptive bootstrap technique to generate candidate steps in the Markov Chain. It is efficient for symmetric, convex probability distributions, similar to multivariate Gaussians, and it can be used for Bayesian estimation or for obtaining maximum likelihood solutions with confidence limits. As a test case, the Law of Categorical Judgment (Corrected) was fitted with the algorithm to data sets from simulated rating scale experiments. The correct parameters were recovered from practical-sized data sets simulated for Full Signal Detection Theory and its special cases of standard Signal Detection Theory and Complementary Signal Detection Theory.
△ Less
Submitted 9 August, 2010;
originally announced August 2010.
-
Detailed Mass Map of CL0024+1654 from Strong Lensing
Authors:
J. Anthony Tyson,
Greg P. Kochanski,
Ian P. Dell'Antonio
Abstract:
We construct a high resolution mass map of the z=0.39 cluster 0024+1654, based on parametric inversion of the associated gravitational lens. The lens creates eight well-resolved sub-images of a background galaxy, seen in deep imaging with HST. Excluding mass concentrations centered on visible galaxies, more than 98% of the remaining mass is represented by a smooth concentration of dark matter ce…
▽ More
We construct a high resolution mass map of the z=0.39 cluster 0024+1654, based on parametric inversion of the associated gravitational lens. The lens creates eight well-resolved sub-images of a background galaxy, seen in deep imaging with HST. Excluding mass concentrations centered on visible galaxies, more than 98% of the remaining mass is represented by a smooth concentration of dark matter centered near the brightest cluster galaxies, with a 35 h^{-1} kpc soft core. The asymmetry in the mass distribution is <3% inside 107 ~h^{-1} kpc radius. The dark matter distribution we observe in CL0024 is far more smooth, symmetric, and nonsingular than in typical simulated clusters in either Omega=1 or Omega=0.3 CDM cosmologies. Integrated to 107 h^{-1} kpc radius, the rest-frame mass to light ratio is M/L_V = 276\pm 40 h (M/L_V)_solar.
△ Less
Submitted 2 May, 1998; v1 submitted 20 January, 1998;
originally announced January 1998.
-
Flickering Faint Galaxies: Few and Far Between
Authors:
Greg P. Kochanski,
J. Anthony Tyson,
Philippe Fischer
Abstract:
Optical variability in galaxies at high redshift is a tracer of evolution in AGN activity, and should provide a useful constraint on models of galaxy evolution, AGN structure, and cosmology. We studied optical variability in multiple deep CCD and photographic surveys of blank fields for galaxies with $B_j = 20 - 25$ mag. Weakly variable objects are far more common than strongly variable ones. Fo…
▽ More
Optical variability in galaxies at high redshift is a tracer of evolution in AGN activity, and should provide a useful constraint on models of galaxy evolution, AGN structure, and cosmology. We studied optical variability in multiple deep CCD and photographic surveys of blank fields for galaxies with $B_j = 20 - 25$ mag. Weakly variable objects are far more common than strongly variable ones. For objects near $B_j = 22$, $0.74\% \pm 0.2 \%$ vary by 0.026~mag RMS or more, over a decade. This is small compared with previous claims based on photographic surveys, and also small compared with the fraction of bright quasars ($\approx 5\%$ at $B_j = 20$~mag) or Seyferts ($\approx 1-2\%$ for $B_j < 18$). The fraction of objects that vary increases slowly with magnitude. Detection probabilities and error rates were checked by simulations and statistical analysis of fluctuations of sample sky spots.
△ Less
Submitted 30 January, 1996;
originally announced January 1996.
-
Optimal Addition of Images for Detection and Photometry
Authors:
Philippe Fischer,
Greg P. Kochanski
Abstract:
In this paper we describe weighting techniques used for the optimal coaddition of CCD frames with differing characteristics. Optimal means maximum signal-to-noise (s/n) for stellar objects. We derive formulae for four applications: 1) object detection via matched filter, 2) object detection identical to DAOFIND, 3) aperture photometry, and 4) ALLSTAR profile-fitting photometry. We have included…
▽ More
In this paper we describe weighting techniques used for the optimal coaddition of CCD frames with differing characteristics. Optimal means maximum signal-to-noise (s/n) for stellar objects. We derive formulae for four applications: 1) object detection via matched filter, 2) object detection identical to DAOFIND, 3) aperture photometry, and 4) ALLSTAR profile-fitting photometry. We have included examples involving 21 frames for which either the sky brightness or image resolution varied by a factor of three. The gains in s/n were modest for most of the examples, except for DAOFIND detection with varying image resolution which exhibited a substantial s/n increase. Even though the only consideration was maximizing s/n, the image resolution was seen to improve for most of the variable resolution examples. Also discussed are empirical fits for the weighting and the availability of the program, WEIGHT, used to generate the weighting for the individual frames. Finally, we include appendices describing the effects of clip** algorithms and a scheme for star/galaxy and cosmic ray/star discrimination.
△ Less
Submitted 14 October, 1993;
originally announced October 1993.