Search | arXiv e-print repository

Open Source Vizier: Distributed Infrastructure and API for Reliable and Flexible Blackbox Optimization

Authors: Xingyou Song, Sagi Perel, Chansoo Lee, Greg Kochanski, Daniel Golovin

Abstract: Vizier is the de-facto blackbox and hyperparameter optimization service across Google, having optimized some of Google's largest products and research efforts. To operate at the scale of tuning thousands of users' critical systems, Google Vizier solved key design challenges in providing multiple different features, while remaining fully fault-tolerant. In this paper, we introduce Open Source (OSS)… ▽ More Vizier is the de-facto blackbox and hyperparameter optimization service across Google, having optimized some of Google's largest products and research efforts. To operate at the scale of tuning thousands of users' critical systems, Google Vizier solved key design challenges in providing multiple different features, while remaining fully fault-tolerant. In this paper, we introduce Open Source (OSS) Vizier, a standalone Python-based interface for blackbox optimization and research, based on the Google-internal Vizier infrastructure and framework. OSS Vizier provides an API capable of defining and solving a wide variety of optimization problems, including multi-metric, early stop**, transfer learning, and conditional search. Furthermore, it is designed to be a distributed system that assures reliability, and allows multiple parallel evaluations of the user's objective function. The flexible RPC-based infrastructure allows users to access OSS Vizier from binaries written in any language. OSS Vizier also provides a back-end ("Pythia") API that gives algorithm authors a way to interface new algorithms with the core OSS Vizier system. OSS Vizier is available at https://github.com/google/vizier. △ Less

Submitted 10 January, 2023; v1 submitted 27 July, 2022; originally announced July 2022.

Comments: Published as a conference paper for the systems track at the 1st International Conference on Automated Machine Learning (AutoML-Conf 2022). Code can be found at https://github.com/google/vizier

arXiv:2205.13320 [pdf, other]

Towards Learning Universal Hyperparameter Optimizers with Transformers

Authors: Yutian Chen, Xingyou Song, Chansoo Lee, Zi Wang, Qiuyi Zhang, David Dohan, Kazuya Kawakami, Greg Kochanski, Arnaud Doucet, Marc'aurelio Ranzato, Sagi Perel, Nando de Freitas

Abstract: Meta-learning hyperparameter optimization (HPO) algorithms from prior experiments is a promising approach to improve optimization efficiency over objective functions from a similar distribution. However, existing methods are restricted to learning from experiments sharing the same set of hyperparameters. In this paper, we introduce the OptFormer, the first text-based Transformer HPO framework that… ▽ More Meta-learning hyperparameter optimization (HPO) algorithms from prior experiments is a promising approach to improve optimization efficiency over objective functions from a similar distribution. However, existing methods are restricted to learning from experiments sharing the same set of hyperparameters. In this paper, we introduce the OptFormer, the first text-based Transformer HPO framework that provides a universal end-to-end interface for jointly learning policy and function prediction when trained on vast tuning data from the wild, such as Google's Vizier database, one of the world's largest HPO datasets. Our extensive experiments demonstrate that the OptFormer can simultaneously imitate at least 7 different HPO algorithms, which can be further improved via its function uncertainty estimates. Compared to a Gaussian Process, the OptFormer also learns a robust prior distribution for hyperparameter response functions, and can thereby provide more accurate and better calibrated predictions. This work paves the path to future extensions for training a Transformer-based model as a general HPO optimizer. △ Less

Submitted 13 October, 2022; v1 submitted 26 May, 2022; originally announced May 2022.

Comments: Published as a conference paper in Neural Information Processing Systems (NeurIPS) 2022. Code can be found in https://github.com/google-research/optformer and Google AI Blog can be found in https://ai.googleblog.com/2022/08/optformer-towards-universal.html

arXiv:1911.06317 [pdf, other]

Gradientless Descent: High-Dimensional Zeroth-Order Optimization

Authors: Daniel Golovin, John Karro, Greg Kochanski, Chansoo Lee, Xingyou Song, Qiuyi Zhang

Abstract: Zeroth-order optimization is the process of minimizing an objective $f(x)$, given oracle access to evaluations at adaptively chosen inputs $x$. In this paper, we present two simple yet powerful GradientLess Descent (GLD) algorithms that do not rely on an underlying gradient estimate and are numerically stable. We analyze our algorithm from a novel geometric perspective and present a novel analysis… ▽ More Zeroth-order optimization is the process of minimizing an objective $f(x)$, given oracle access to evaluations at adaptively chosen inputs $x$. In this paper, we present two simple yet powerful GradientLess Descent (GLD) algorithms that do not rely on an underlying gradient estimate and are numerically stable. We analyze our algorithm from a novel geometric perspective and present a novel analysis that shows convergence within an $ε$-ball of the optimum in $O(kQ\log(n)\log(R/ε))$ evaluations, for any monotone transform of a smooth and strongly convex objective with latent dimension $k < n$, where the input dimension is $n$, $R$ is the diameter of the input space and $Q$ is the condition number. Our rates are the first of its kind to be both 1) poly-logarithmically dependent on dimensionality and 2) invariant under monotone transformations. We further leverage our geometric perspective to show that our analysis is optimal. Both monotone invariance and its ability to utilize a low latent dimensionality are key to the empirical success of our algorithms, as demonstrated on BBOB and MuJoCo benchmarks. △ Less

Submitted 18 May, 2020; v1 submitted 14 November, 2019; originally announced November 2019.

Comments: 11 main pages, 26 total pages

Journal ref: ICLR 2020 Spotlight

arXiv:1601.08244 [pdf]

Categorical Judgment with a Variable Decision Rule

Authors: Burton Rosner, Greg Kochanski

Abstract: A new Thurstonian rating scale model uses a variable decision rule (VDR) that incorporates three previously formulated, distinct decision rules. The model includes probabilities for choosing each rule, along with Gaussian representation and criterion densities. Numerical optimisation techniques were validated through demonstrating that the model fits simulated data tightly. For simulations with 40… ▽ More A new Thurstonian rating scale model uses a variable decision rule (VDR) that incorporates three previously formulated, distinct decision rules. The model includes probabilities for choosing each rule, along with Gaussian representation and criterion densities. Numerical optimisation techniques were validated through demonstrating that the model fits simulated data tightly. For simulations with 400 trials per stimulus (tps), useful information emerged about the generating parameters. However, larger experiments (e.g. 4000 tps) proved desirable for better recovery of generating parameters and to support trustworthy choices between competing models by the Akaike Information Criterion. In reanalyses of experiments by others, the VDR model explained most of the data better than did classical signal detection theory models. △ Less

Submitted 13 December, 2015; originally announced January 2016.

Comments: Contains source code as an attachment to this PDF, in files 2015-12-02_speechresearch.tgz and Rosner_Kochanski_2016.tgz

arXiv:1204.3236 [pdf]

Using Mimicry to Learn about Mental Representations

Authors: Greg Kochanski

Abstract: Phonology typically describes speech in terms of discrete signs like features. The field of intonational phonology uses discrete accents to describe intonation and prosody. But, are such representations useful? The results of mimicry experiments indicate that discrete signs are not a useful representation of the shape of intonation contours. Human behaviour seems to be better represented by a attr… ▽ More Phonology typically describes speech in terms of discrete signs like features. The field of intonational phonology uses discrete accents to describe intonation and prosody. But, are such representations useful? The results of mimicry experiments indicate that discrete signs are not a useful representation of the shape of intonation contours. Human behaviour seems to be better represented by a attractors where memory retains substantial fine detail about an utterance. There is no evidence that discrete abstract representations that might be formed that have an effect on the speech that is subsequently produced. This paper also discusses conditions under which a discrete phonology can arise from an attractor model and why - for intonation - attractors can be inferred without the implying a discrete phonology. △ Less

Submitted 15 April, 2012; originally announced April 2012.

Comments: 36 pages, plus extra figures

arXiv:1101.1682 [pdf, other]

Detecting gross alignment errors in the Spoken British National Corpus

Authors: Ladan Baghai-Ravary, Sergio Grau, Greg Kochanski

Abstract: The paper presents methods for evaluating the accuracy of alignments between transcriptions and audio recordings. The methods have been applied to the Spoken British National Corpus, which is an extensive and varied corpus of natural unscripted speech. Early results show good agreement with human ratings of alignment accuracy. The methods also provide an indication of the location of likely alignm… ▽ More The paper presents methods for evaluating the accuracy of alignments between transcriptions and audio recordings. The methods have been applied to the Spoken British National Corpus, which is an extensive and varied corpus of natural unscripted speech. Early results show good agreement with human ratings of alignment accuracy. The methods also provide an indication of the location of likely alignment problems; this should allow efficient manual examination of large corpora. Automatic checking of such alignments is crucial when analysing any very large corpus, since even the best current speech alignment systems will occasionally make serious errors. The methods described here use a hybrid approach based on statistics of the speech signal itself, statistics of the labels being evaluated, and statistics linking the two. △ Less

Submitted 9 January, 2011; originally announced January 2011.

Comments: Four pages, 3 figures. Presented at "New Tools and Methods for Very-Large-Scale Phonetics Research", University of Pennsylvania, January 28-31, 2011

arXiv:1012.2797 [pdf, ps, other]

Should Corpora be Big, Rich, or Dense?

Authors: Greg P. Kochanski, Chilin Shih, Ryan Shosted

Abstract: In this paper, we ask what properties makes a large corpus more or less useful. We suggest that size, by itself, should not be the ultimate goal of building a corpus. Large-scale corpora are considered desirable because they offer statistical stability and rich variation. But this rich variation means more factors to control and evaluate, which can limit the advantages of size. We discuss the use… ▽ More In this paper, we ask what properties makes a large corpus more or less useful. We suggest that size, by itself, should not be the ultimate goal of building a corpus. Large-scale corpora are considered desirable because they offer statistical stability and rich variation. But this rich variation means more factors to control and evaluate, which can limit the advantages of size. We discuss the use of multi-channel data to complement large-scale speech corpora. Even though multi-channel data may limit the scale of a corpus (due to the complex and labor-intensive nature of data collection) they can offer information that allows us to tease apart various factors related to speech production. △ Less

Submitted 13 December, 2010; originally announced December 2010.

arXiv:1008.1596 [pdf]

Bootstrap Markov chain Monte Carlo and optimal solutions for the Law of Categorical Judgment (Corrected)

Authors: Greg Kochanski, Burton S. Rosner

Abstract: A novel procedure is described for accelerating the convergence of Markov chain Monte Carlo computations. The algorithm uses an adaptive bootstrap technique to generate candidate steps in the Markov Chain. It is efficient for symmetric, convex probability distributions, similar to multivariate Gaussians, and it can be used for Bayesian estimation or for obtaining maximum likelihood solutions with… ▽ More A novel procedure is described for accelerating the convergence of Markov chain Monte Carlo computations. The algorithm uses an adaptive bootstrap technique to generate candidate steps in the Markov Chain. It is efficient for symmetric, convex probability distributions, similar to multivariate Gaussians, and it can be used for Bayesian estimation or for obtaining maximum likelihood solutions with confidence limits. As a test case, the Law of Categorical Judgment (Corrected) was fitted with the algorithm to data sets from simulated rating scale experiments. The correct parameters were recovered from practical-sized data sets simulated for Full Signal Detection Theory and its special cases of standard Signal Detection Theory and Complementary Signal Detection Theory. △ Less

Submitted 9 August, 2010; originally announced August 2010.

MSC Class: 65C05 ACM Class: G.3

arXiv:astro-ph/9801193 [pdf, ps, other]

doi 10.1086/311314

Detailed Mass Map of CL0024+1654 from Strong Lensing

Authors: J. Anthony Tyson, Greg P. Kochanski, Ian P. Dell'Antonio

Abstract: We construct a high resolution mass map of the z=0.39 cluster 0024+1654, based on parametric inversion of the associated gravitational lens. The lens creates eight well-resolved sub-images of a background galaxy, seen in deep imaging with HST. Excluding mass concentrations centered on visible galaxies, more than 98% of the remaining mass is represented by a smooth concentration of dark matter ce… ▽ More We construct a high resolution mass map of the z=0.39 cluster 0024+1654, based on parametric inversion of the associated gravitational lens. The lens creates eight well-resolved sub-images of a background galaxy, seen in deep imaging with HST. Excluding mass concentrations centered on visible galaxies, more than 98% of the remaining mass is represented by a smooth concentration of dark matter centered near the brightest cluster galaxies, with a 35 h^{-1} kpc soft core. The asymmetry in the mass distribution is <3% inside 107 ~h^{-1} kpc radius. The dark matter distribution we observe in CL0024 is far more smooth, symmetric, and nonsingular than in typical simulated clusters in either Omega=1 or Omega=0.3 CDM cosmologies. Integrated to 107 h^{-1} kpc radius, the rest-frame mass to light ratio is M/L_V = 276\pm 40 h (M/L_V)_solar. △ Less

Submitted 2 May, 1998; v1 submitted 20 January, 1998; originally announced January 1998.

Comments: 16 pages, 4 figures (3 .jpg, 1 .ps), minor changes to make consistent with the final ApJL article. To appear in ApJL, May 8 1998

Report number: Bell Labs TM-970624-23

Journal ref: Astrophys.J.498:L107,1998

arXiv:astro-ph/9601180 [pdf, ps, other]

doi 10.1086/117889

Flickering Faint Galaxies: Few and Far Between

Authors: Greg P. Kochanski, J. Anthony Tyson, Philippe Fischer

Abstract: Optical variability in galaxies at high redshift is a tracer of evolution in AGN activity, and should provide a useful constraint on models of galaxy evolution, AGN structure, and cosmology. We studied optical variability in multiple deep CCD and photographic surveys of blank fields for galaxies with $B_j = 20 - 25$ mag. Weakly variable objects are far more common than strongly variable ones. Fo… ▽ More Optical variability in galaxies at high redshift is a tracer of evolution in AGN activity, and should provide a useful constraint on models of galaxy evolution, AGN structure, and cosmology. We studied optical variability in multiple deep CCD and photographic surveys of blank fields for galaxies with $B_j = 20 - 25$ mag. Weakly variable objects are far more common than strongly variable ones. For objects near $B_j = 22$, $0.74\% \pm 0.2 \%$ vary by 0.026~mag RMS or more, over a decade. This is small compared with previous claims based on photographic surveys, and also small compared with the fraction of bright quasars ($\approx 5\%$ at $B_j = 20$~mag) or Seyferts ($\approx 1-2\%$ for $B_j < 18$). The fraction of objects that vary increases slowly with magnitude. Detection probabilities and error rates were checked by simulations and statistical analysis of fluctuations of sample sky spots. △ Less

Submitted 30 January, 1996; originally announced January 1996.

Comments: AAS LaTeX file, full paper with figures available at http://www.astro.lsa.umich.edu:80/users/philf/ Accepted for publication in the AJ

arXiv:astro-ph/9310031 [pdf, ps]

doi 10.1086/116898

Optimal Addition of Images for Detection and Photometry

Authors: Philippe Fischer, Greg P. Kochanski

Abstract: In this paper we describe weighting techniques used for the optimal coaddition of CCD frames with differing characteristics. Optimal means maximum signal-to-noise (s/n) for stellar objects. We derive formulae for four applications: 1) object detection via matched filter, 2) object detection identical to DAOFIND, 3) aperture photometry, and 4) ALLSTAR profile-fitting photometry. We have included… ▽ More In this paper we describe weighting techniques used for the optimal coaddition of CCD frames with differing characteristics. Optimal means maximum signal-to-noise (s/n) for stellar objects. We derive formulae for four applications: 1) object detection via matched filter, 2) object detection identical to DAOFIND, 3) aperture photometry, and 4) ALLSTAR profile-fitting photometry. We have included examples involving 21 frames for which either the sky brightness or image resolution varied by a factor of three. The gains in s/n were modest for most of the examples, except for DAOFIND detection with varying image resolution which exhibited a substantial s/n increase. Even though the only consideration was maximizing s/n, the image resolution was seen to improve for most of the variable resolution examples. Also discussed are empirical fits for the weighting and the availability of the program, WEIGHT, used to generate the weighting for the individual frames. Finally, we include appendices describing the effects of clip** algorithms and a scheme for star/galaxy and cosmic ray/star discrimination. △ Less

Submitted 14 October, 1993; originally announced October 1993.

Comments: 27 pages (uuencoded compressed postscript), 1993

Journal ref: Astron.J. 107 (1994) 802-810

Showing 1–11 of 11 results for author: Kochanski, G