Skip to main content

Showing 1–7 of 7 results for author: Pereyra, G

.
  1. arXiv:2212.12017  [pdf, other

    cs.CL

    OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization

    Authors: Srinivasan Iyer, Xi Victoria Lin, Ramakanth Pasunuru, Todor Mihaylov, Daniel Simig, ** Yu, Kurt Shuster, Tianlu Wang, Qing Liu, Punit Singh Koura, Xian Li, Brian O'Horo, Gabriel Pereyra, Jeff Wang, Christopher Dewan, Asli Celikyilmaz, Luke Zettlemoyer, Ves Stoyanov

    Abstract: Recent work has shown that fine-tuning large pre-trained language models on a collection of tasks described via instructions, a.k.a. instruction-tuning, improves their zero and few-shot generalization to unseen tasks. However, there is a limited understanding of the performance trade-offs of different decisions made during the instruction-tuning process. These decisions include the scale and diver… ▽ More

    Submitted 30 January, 2023; v1 submitted 22 December, 2022; originally announced December 2022.

    Comments: 56 pages. v2->v3: fix OPT-30B evaluation results across benchmarks (previously we reported lower performance of this model due to an evaluation pipeline bug)

  2. arXiv:1804.03235  [pdf, other

    cs.LG cs.AI stat.ML

    Large scale distributed neural network training through online distillation

    Authors: Rohan Anil, Gabriel Pereyra, Alexandre Passos, Robert Ormandi, George E. Dahl, Geoffrey E. Hinton

    Abstract: Techniques such as ensembling and distillation promise model quality improvements when paired with almost any base model. However, due to increased test-time cost (for ensembles) and increased complexity of the training pipeline (for distillation), these techniques are challenging to use in industrial settings. In this paper we explore a variant of distillation which is relatively straightforward… ▽ More

    Submitted 20 August, 2020; v1 submitted 9 April, 2018; originally announced April 2018.

    Comments: Clarify that implementations should use available parallelism in pseudo-code

  3. arXiv:1706.03859  [pdf

    physics.soc-ph cs.CY cs.SI

    Size invariance sector for an agent-based innovation diffusion model

    Authors: Carlos E. Laciana, Gustavo Pereyra, Santiago L. Rovere

    Abstract: It is shown that under certain conditions it is possible to model a complex system in a way that leads to results that do not depend on system size. As an example of complex system an innovation diffusion model is considered. In that model a set of individuals (the agents), which are interconnected, must decide if adopt or not an innovation. The agents are connected in a member of the networks fam… ▽ More

    Submitted 27 July, 2017; v1 submitted 12 June, 2017; originally announced June 2017.

    Comments: 10 pages

  4. arXiv:1701.06548  [pdf, other

    cs.NE cs.LG

    Regularizing Neural Networks by Penalizing Confident Output Distributions

    Authors: Gabriel Pereyra, George Tucker, Jan Chorowski, Łukasz Kaiser, Geoffrey Hinton

    Abstract: We systematically explore regularizing neural networks by penalizing low entropy output distributions. We show that penalizing low entropy output distributions, which has been shown to improve exploration in reinforcement learning, acts as a strong regularizer in supervised learning. Furthermore, we connect a maximum entropy based confidence penalty to label smoothing through the direction of the… ▽ More

    Submitted 23 January, 2017; originally announced January 2017.

    Comments: Submitted to ICLR 2017

  5. arXiv:1510.01378  [pdf, other

    stat.ML cs.LG cs.NE

    Batch Normalized Recurrent Neural Networks

    Authors: César Laurent, Gabriel Pereyra, Philémon Brakel, Ying Zhang, Yoshua Bengio

    Abstract: Recurrent Neural Networks (RNNs) are powerful models for sequential data that have the potential to learn long-term dependencies. However, they are computationally expensive to train and difficult to parallelize. Recent work has shown that normalizing intermediate representations of neural networks can significantly improve convergence rates in feedforward neural networks . In particular, batch no… ▽ More

    Submitted 5 October, 2015; originally announced October 2015.

  6. arXiv:1307.3611  [pdf, other

    physics.chem-ph cond-mat.stat-mech

    On the relation between hydrogen bonds, tetrahedral order and molecular mobility in model water

    Authors: R. G. Pereyra, A. Bermudez di Lorenzo, D. C. Malaspina, M. A. Carignano

    Abstract: We studied by molecular dynamics simulations the relation existing between the lifetime of hydrogen bonds, the tetrahedral order and the diffusion coefficient of model water. We tested four different models: SPC/E, TIP4P-Ew, TIP5P-Ew and Six-site, these last two having sites explicitly resembling the water lone pairs. While all the models perform reasonably well at ambient conditions, their behavi… ▽ More

    Submitted 13 July, 2013; originally announced July 2013.

    Comments: 13 pages, 5 figures

    Journal ref: Chemical Physics Letters, v. 538, pp. 35-38 (2012)

  7. arXiv:1307.3405  [pdf, other

    physics.chem-ph cond-mat.soft

    The water supercooled regime as described by four common water models

    Authors: David C. Malaspina, Aleida J. Bermudez di Lorenzo, Rodolfo G. Pereyra, Igal Szleifer, Marcelo A. Carignano

    Abstract: The temperature scale of simple water models in general does not coincide with the natural one. Therefore, in order to make a meaningful evaluation of different water models a temperature rescaling is necessary. In this paper we introduce a rescaling using the melting temperature and the temperature corresponding to the maximum of the heat capacity to evaluate four common water models (TIP4P-Ew, T… ▽ More

    Submitted 12 July, 2013; originally announced July 2013.

    Comments: 8 pages, 8 figures