Skip to main content

Showing 1–19 of 19 results for author: Perrone, V

.
  1. arXiv:2405.02267  [pdf, other

    cs.LG cs.CL

    Structural Pruning of Pre-trained Language Models via Neural Architecture Search

    Authors: Aaron Klein, Jacek Golebiowski, Xingchen Ma, Valerio Perrone, Cedric Archambeau

    Abstract: Pre-trained language models (PLM), for example BERT or RoBERTa, mark the state-of-the-art for natural language understanding task when fine-tuned on labeled data. However, their large size poses challenges in deploying them for inference in real-world applications, due to significant GPU memory requirements and high inference latency. This paper explores neural architecture search (NAS) for struct… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

  2. arXiv:2111.06924  [pdf, other

    cs.LG

    A Simple and Fast Baseline for Tuning Large XGBoost Models

    Authors: Sanyam Kapoor, Valerio Perrone

    Abstract: XGBoost, a scalable tree boosting algorithm, has proven effective for many prediction tasks of practical interest, especially using tabular datasets. Hyperparameter tuning can further improve the predictive performance, but unlike neural networks, full-batch training of many models on large datasets can be time consuming. Owing to the discovery that (i) there is a strong linear relation between da… ▽ More

    Submitted 12 November, 2021; originally announced November 2021.

    Comments: Technical Report

  3. arXiv:2106.06079  [pdf, other

    cs.LG stat.ML

    A Nonmyopic Approach to Cost-Constrained Bayesian Optimization

    Authors: Eric Hans Lee, David Eriksson, Valerio Perrone, Matthias Seeger

    Abstract: Bayesian optimization (BO) is a popular method for optimizing expensive-to-evaluate black-box functions. BO budgets are typically given in iterations, which implicitly assumes each evaluation has the same cost. In fact, in many BO applications, evaluation costs vary significantly in different regions of the search space. In hyperparameter optimization, the time spent on neural network training inc… ▽ More

    Submitted 10 June, 2021; originally announced June 2021.

    Comments: To appear in UAI 2021

  4. arXiv:2106.05680  [pdf, other

    cs.LG

    A multi-objective perspective on jointly tuning hardware and hyperparameters

    Authors: David Salinas, Valerio Perrone, Olivier Cruchant, Cedric Archambeau

    Abstract: In addition to the best model architecture and hyperparameters, a full AutoML solution requires selecting appropriate hardware automatically. This can be framed as a multi-objective optimization problem: there is not a single best hardware configuration but a set of optimal ones achieving different trade-offs between cost and runtime. In practice, some choices may be overly costly or take days to… ▽ More

    Submitted 10 June, 2021; originally announced June 2021.

  5. arXiv:2104.08166  [pdf, other

    cs.LG cs.AI stat.ML

    Automatic Termination for Hyperparameter Optimization

    Authors: Anastasia Makarova, Huibin Shen, Valerio Perrone, Aaron Klein, Jean Baptiste Faddoul, Andreas Krause, Matthias Seeger, Cedric Archambeau

    Abstract: Bayesian optimization (BO) is a widely popular approach for the hyperparameter optimization (HPO) in machine learning. At its core, BO iteratively evaluates promising configurations until a user-defined budget, such as wall-clock time or number of iterations, is exhausted. While the final performance after tuning heavily depends on the provided budget, it is hard to pre-specify an optimal value in… ▽ More

    Submitted 22 July, 2022; v1 submitted 16 April, 2021; originally announced April 2021.

    Comments: Accepted at AutoML Conference 2022

  6. arXiv:2101.09069  [pdf, other

    cs.CL cs.LG

    Lexical semantic change for Ancient Greek and Latin

    Authors: Valerio Perrone, Simon Hengchen, Marco Palma, Alessandro Vatri, Jim Q. Smith, Barbara McGillivray

    Abstract: Change and its precondition, variation, are inherent in languages. Over time, new words enter the lexicon, others become obsolete, and existing words acquire new senses. Associating a word's correct meaning in its historical context is a central challenge in diachronic research. Historical corpora of classical languages, such as Ancient Greek and Latin, typically come with rich metadata, and exist… ▽ More

    Submitted 22 January, 2021; originally announced January 2021.

    Comments: To appear in: Nina Tahmasebi, Lars Borin, Adam Jatowt, Yang Xu, Simon Hengchen (eds). Computational Approaches to Semantic Change. Berlin: Language Science Press. [preliminary page numbering]

  7. arXiv:2012.08489  [pdf, other

    cs.LG cs.AI stat.ML

    Amazon SageMaker Automatic Model Tuning: Scalable Gradient-Free Optimization

    Authors: Valerio Perrone, Huibin Shen, Aida Zolic, Iaroslav Shcherbatyi, Amr Ahmed, Tanya Bansal, Michele Donini, Fela Winkelmolen, Rodolphe Jenatton, Jean Baptiste Faddoul, Barbara Pogorzelska, Miroslav Miladinovic, Krishnaram Kenthapadi, Matthias Seeger, Cédric Archambeau

    Abstract: Tuning complex machine learning systems is challenging. Machine learning typically requires to set hyperparameters, be it regularization, architecture, or optimization parameters, whose tuning is critical to achieve good predictive performance. To democratize access to machine learning systems, it is essential to automate the tuning. This paper presents Amazon SageMaker Automatic Model Tuning (AMT… ▽ More

    Submitted 18 June, 2021; v1 submitted 15 December, 2020; originally announced December 2020.

  8. arXiv:2012.08483  [pdf, other

    cs.LG

    Amazon SageMaker Autopilot: a white box AutoML solution at scale

    Authors: Piali Das, Valerio Perrone, Nikita Ivkin, Tanya Bansal, Zohar Karnin, Huibin Shen, Iaroslav Shcherbatyi, Yotam Elor, Wilton Wu, Aida Zolic, Thibaut Lienart, Alex Tang, Amr Ahmed, Jean Baptiste Faddoul, Rodolphe Jenatton, Fela Winkelmolen, Philip Gautier, Leo Dirac, Andre Perunicic, Miroslav Miladinovic, Giovanni Zappella, Cédric Archambeau, Matthias Seeger, Bhaskar Dutt, Laurence Rouesnel

    Abstract: AutoML systems provide a black-box solution to machine learning problems by selecting the right way of processing features, choosing an algorithm and tuning the hyperparameters of the entire pipeline. Although these systems perform well on many datasets, there is still a non-negligible number of datasets for which the one-shot solution produced by each particular system would provide sub-par perfo… ▽ More

    Submitted 16 December, 2020; v1 submitted 15 December, 2020; originally announced December 2020.

  9. arXiv:2011.11456  [pdf, other

    cs.LG stat.ML

    Pareto-efficient Acquisition Functions for Cost-Aware Bayesian Optimization

    Authors: Gauthier Guinet, Valerio Perrone, Cédric Archambeau

    Abstract: Bayesian optimization (BO) is a popular method to optimize expensive black-box functions. It efficiently tunes machine learning algorithms under the implicit assumption that hyperparameter evaluations cost approximately the same. In reality, the cost of evaluating different hyperparameters, be it in terms of time, dollars or energy, can span several orders of magnitude of difference. While a numbe… ▽ More

    Submitted 24 November, 2020; v1 submitted 23 November, 2020; originally announced November 2020.

    Comments: 11 pages, 9 figures, 4th Workshop on Meta-Learning at NeurIPS 2020

  10. arXiv:2006.05109  [pdf, other

    stat.ML cs.LG

    Fair Bayesian Optimization

    Authors: Valerio Perrone, Michele Donini, Muhammad Bilal Zafar, Robin Schmucker, Krishnaram Kenthapadi, Cédric Archambeau

    Abstract: Given the increasing importance of machine learning (ML) in our lives, several algorithmic fairness techniques have been proposed to mitigate biases in the outcomes of the ML models. However, most of these techniques are specialized to cater to a single family of ML models and a specific definition of fairness, limiting their adaptibility in practice. We introduce a general constrained Bayesian op… ▽ More

    Submitted 18 June, 2021; v1 submitted 9 June, 2020; originally announced June 2020.

  11. arXiv:2003.10870  [pdf, other

    cs.LG stat.ML

    Cost-aware Bayesian Optimization

    Authors: Eric Hans Lee, Valerio Perrone, Cedric Archambeau, Matthias Seeger

    Abstract: Bayesian optimization (BO) is a class of global optimization algorithms, suitable for minimizing an expensive objective function in as few function evaluations as possible. While BO budgets are typically given in iterations, this implicitly measures convergence in terms of iteration count and assumes each evaluation has identical cost. In practice, evaluation costs may vary in different regions of… ▽ More

    Submitted 22 March, 2020; originally announced March 2020.

  12. arXiv:1910.07003  [pdf, other

    stat.ML cs.LG

    Constrained Bayesian Optimization with Max-Value Entropy Search

    Authors: Valerio Perrone, Iaroslav Shcherbatyi, Rodolphe Jenatton, Cedric Archambeau, Matthias Seeger

    Abstract: Bayesian optimization (BO) is a model-based approach to sequentially optimize expensive black-box functions, such as the validation error of a deep neural network with respect to its hyperparameters. In many real-world scenarios, the optimization is further subject to a priori unknown constraints. For example, training a deep network configuration may fail with an out-of-memory error when the mode… ▽ More

    Submitted 15 October, 2019; originally announced October 2019.

  13. arXiv:1909.13595  [pdf, other

    stat.ML cs.LG

    A Quantile-based Approach for Hyperparameter Transfer Learning

    Authors: David Salinas, Huibin Shen, Valerio Perrone

    Abstract: Bayesian optimization (BO) is a popular methodology to tune the hyperparameters of expensive black-box functions. Traditionally, BO focuses on a single task at a time and is not designed to leverage information from related functions, such as tuning performance objectives of the same algorithm across multiple datasets. In this work, we introduce a novel approach to achieve transfer learning across… ▽ More

    Submitted 19 April, 2021; v1 submitted 30 September, 2019; originally announced September 2019.

  14. arXiv:1909.12552  [pdf, other

    stat.ML cs.LG

    Learning search spaces for Bayesian optimization: Another view of hyperparameter transfer learning

    Authors: Valerio Perrone, Huibin Shen, Matthias Seeger, Cedric Archambeau, Rodolphe Jenatton

    Abstract: Bayesian optimization (BO) is a successful methodology to optimize black-box functions that are expensive to evaluate. While traditional methods optimize each black-box function in isolation, there has been recent interest in speeding up BO by transferring knowledge across multiple related black-box functions. In this work, we introduce a method to automatically design the BO search space by relyi… ▽ More

    Submitted 27 September, 2019; originally announced September 2019.

  15. arXiv:1903.05587  [pdf, other

    cs.CL cs.LG stat.ML

    GASC: Genre-Aware Semantic Change for Ancient Greek

    Authors: Valerio Perrone, Marco Palma, Simon Hengchen, Alessandro Vatri, Jim Q. Smith, Barbara McGillivray

    Abstract: Word meaning changes over time, depending on linguistic and extra-linguistic factors. Associating a word's correct meaning in its historical context is a central challenge in diachronic research, and is relevant to a range of NLP tasks, including information retrieval and semantic search in historical texts. Bayesian models for semantic change have emerged as a powerful tool to address this challe… ▽ More

    Submitted 2 June, 2019; v1 submitted 13 March, 2019; originally announced March 2019.

  16. arXiv:1802.06153  [pdf, other

    cs.LG q-bio.PE stat.ML

    A Likelihood-Free Inference Framework for Population Genetic Data using Exchangeable Neural Networks

    Authors: Jeffrey Chan, Valerio Perrone, Jeffrey P. Spence, Paul A. Jenkins, Sara Mathieson, Yun S. Song

    Abstract: An explosion of high-throughput DNA sequencing in the past decade has led to a surge of interest in population-scale inference with whole-genome data. Recent work in population genetics has centered on designing inference methods for relatively simple model classes, and few scalable general-purpose inference techniques exist for more realistic, complex models. To achieve this, two inferential chal… ▽ More

    Submitted 5 November, 2018; v1 submitted 16 February, 2018; originally announced February 2018.

    Comments: 9 pages, 8 figures

  17. arXiv:1712.02902  [pdf, other

    stat.ML

    Multiple Adaptive Bayesian Linear Regression for Scalable Bayesian Optimization with Warm Start

    Authors: Valerio Perrone, Rodolphe Jenatton, Matthias Seeger, Cedric Archambeau

    Abstract: Bayesian optimization (BO) is a model-based approach for gradient-free black-box function optimization. Typically, BO is powered by a Gaussian process (GP), whose algorithmic complexity is cubic in the number of evaluations. Hence, GP-based BO cannot leverage large amounts of past or related function evaluations, for example, to warm start the BO procedure. We develop a multiple adaptive Bayesian… ▽ More

    Submitted 7 December, 2017; originally announced December 2017.

  18. arXiv:1611.07460  [pdf, other

    stat.ML

    Poisson Random Fields for Dynamic Feature Models

    Authors: Valerio Perrone, Paul A. Jenkins, Dario Spano, Yee Whye Teh

    Abstract: We present the Wright-Fisher Indian buffet process (WF-IBP), a probabilistic model for time-dependent data assumed to have been generated by an unknown number of latent features. This model is suitable as a prior in Bayesian nonparametric feature allocation models in which the features underlying the observed data exhibit a dependency structure over time. More specifically, we establish a new fram… ▽ More

    Submitted 22 November, 2016; originally announced November 2016.

  19. arXiv:1609.04388  [pdf, other

    stat.ML

    Relativistic Monte Carlo

    Authors: Xiaoyu Lu, Valerio Perrone, Leonard Hasenclever, Yee Whye Teh, Sebastian J. Vollmer

    Abstract: Hamiltonian Monte Carlo (HMC) is a popular Markov chain Monte Carlo (MCMC) algorithm that generates proposals for a Metropolis-Hastings algorithm by simulating the dynamics of a Hamiltonian system. However, HMC is sensitive to large time discretizations and performs poorly if there is a mismatch between the spatial geometry of the target distribution and the scales of the momentum distribution. In… ▽ More

    Submitted 14 September, 2016; originally announced September 2016.