Skip to main content

Showing 1–6 of 6 results for author: Wiseman, S

Searching in archive stat. Search in all archives.
.
  1. arXiv:2206.04615  [pdf, other

    cs.CL cs.AI cs.CY cs.LG stat.ML

    Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

    Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, AdriĆ  Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

    Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More

    Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

    Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

  2. arXiv:2004.03991  [pdf, other

    cs.LG cs.IT stat.ML

    Learning Discrete Structured Representations by Adversarially Maximizing Mutual Information

    Authors: Karl Stratos, Sam Wiseman

    Abstract: We propose learning discrete structured representations from unlabeled data by maximizing the mutual information between a structured latent variable and a target variable. Calculating mutual information is intractable in this setting. Our key technical contribution is an adversarial objective that can be used to tractably estimate mutual information assuming only the feasibility of cross entropy… ▽ More

    Submitted 15 July, 2020; v1 submitted 8 April, 2020; originally announced April 2020.

    Comments: ICML 2020

  3. arXiv:1906.06399  [pdf, other

    cs.LG stat.ML

    Amortized Bethe Free Energy Minimization for Learning MRFs

    Authors: Sam Wiseman, Yoon Kim

    Abstract: We propose to learn deep undirected graphical models (i.e., MRFs) with a non-ELBO objective for which we can calculate exact gradients. In particular, we optimize a saddle-point objective deriving from the Bethe free energy approximation to the partition function. Unlike much recent work in approximate inference, the derived objective requires no sampling, and can be efficiently computed even for… ▽ More

    Submitted 17 November, 2019; v1 submitted 14 June, 2019; originally announced June 2019.

    Comments: NeurIPS 2019

  4. arXiv:1812.06834  [pdf, other

    cs.CL cs.LG stat.ML

    A Tutorial on Deep Latent Variable Models of Natural Language

    Authors: Yoon Kim, Sam Wiseman, Alexander M. Rush

    Abstract: There has been much recent, exciting work on combining the complementary strengths of latent variable models and deep learning. Latent variable modeling makes it easy to explicitly specify model constraints through conditional independence properties, while deep learning makes it possible to parameterize these conditional likelihoods with powerful function approximators. While these "deep latent v… ▽ More

    Submitted 4 August, 2019; v1 submitted 17 December, 2018; originally announced December 2018.

    Comments: EMNLP 2018 Tutorial

  5. arXiv:1802.02550  [pdf, other

    stat.ML cs.CL cs.LG

    Semi-Amortized Variational Autoencoders

    Authors: Yoon Kim, Sam Wiseman, Andrew C. Miller, David Sontag, Alexander M. Rush

    Abstract: Amortized variational inference (AVI) replaces instance-specific local inference with a global inference network. While AVI has enabled efficient training of deep generative models such as variational autoencoders (VAE), recent empirical work suggests that inference networks can produce suboptimal variational parameters. We propose a hybrid approach, to use AVI to initialize the variational parame… ▽ More

    Submitted 23 July, 2018; v1 submitted 7 February, 2018; originally announced February 2018.

    Comments: ICML 2018

  6. arXiv:1606.02960  [pdf, other

    cs.CL cs.LG cs.NE stat.ML

    Sequence-to-Sequence Learning as Beam-Search Optimization

    Authors: Sam Wiseman, Alexander M. Rush

    Abstract: Sequence-to-Sequence (seq2seq) modeling has rapidly become an important general-purpose NLP tool that has proven effective for many text-generation and sequence-labeling tasks. Seq2seq builds on deep neural language modeling and inherits its remarkable accuracy in estimating local, next-word distributions. In this work, we introduce a model and beam-search training scheme, based on the work of Dau… ▽ More

    Submitted 9 November, 2016; v1 submitted 9 June, 2016; originally announced June 2016.

    Comments: EMNLP 2016 camera-ready