Skip to main content

Showing 1–14 of 14 results for author: Gershman, S

Searching in archive stat. Search in all archives.
.
  1. arXiv:2310.06110  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Grokking as the Transition from Lazy to Rich Training Dynamics

    Authors: Tanishq Kumar, Blake Bordelon, Samuel J. Gershman, Cengiz Pehlevan

    Abstract: We propose that the grokking phenomenon, where the train loss of a neural network decreases much earlier than its test loss, can arise due to a neural network transitioning from lazy training dynamics to a rich, feature learning regime. To illustrate this mechanism, we study the simple setting of vanilla gradient descent on a polynomial regression problem with a two layer neural network which exhi… ▽ More

    Submitted 11 April, 2024; v1 submitted 9 October, 2023; originally announced October 2023.

    Comments: Adding new experiments on higher degree Hermite polynomials, multi-index targets, removed DMFT analysis from this version

  2. arXiv:2012.15814  [pdf, other

    cs.LG cs.CL cs.CV stat.ML

    Language-Mediated, Object-Centric Representation Learning

    Authors: Ruocheng Wang, Jiayuan Mao, Samuel J. Gershman, Jiajun Wu

    Abstract: We present Language-mediated, Object-centric Representation Learning (LORL), a paradigm for learning disentangled, object-centric scene representations from vision and language. LORL builds upon recent advances in unsupervised object discovery and segmentation, notably MONet and Slot Attention. While these algorithms learn an object-centric representation just by reconstructing the input image, LO… ▽ More

    Submitted 8 June, 2021; v1 submitted 31 December, 2020; originally announced December 2020.

    Comments: ACL 2021 Findings. First two authors contributed equally; last two authors contributed equally. Project page: https://lang-orl.github.io/

  3. arXiv:1909.05885  [pdf, other

    cs.CL cs.LG stat.ML

    Analyzing machine-learned representations: A natural language case study

    Authors: Ishita Dasgupta, Demi Guo, Samuel J. Gershman, Noah D. Goodman

    Abstract: As modern deep networks become more complex, and get closer to human-like capabilities in certain domains, the question arises of how the representations and decision rules they learn compare to the ones in humans. In this work, we study representations of sentences in one such artificial system for natural language processing. We first present a diagnostic test dataset to examine the degree of ab… ▽ More

    Submitted 12 September, 2019; originally announced September 2019.

    Comments: This article supersedes a previous article arXiv:1802.04302

  4. arXiv:1902.00006  [pdf, other

    cs.LG stat.ML

    An Evaluation of the Human-Interpretability of Explanation

    Authors: Isaac Lage, Emily Chen, Jeffrey He, Menaka Narayanan, Been Kim, Sam Gershman, Finale Doshi-Velez

    Abstract: Recent years have seen a boom in interest in machine learning systems that can provide a human-understandable rationale for their predictions or decisions. However, exactly what kinds of explanation are truly human-interpretable remains poorly understood. This work advances our understanding of what makes explanations interpretable under three specific tasks that users may perform with machine lea… ▽ More

    Submitted 28 August, 2019; v1 submitted 30 January, 2019; originally announced February 2019.

    Comments: arXiv admin note: substantial text overlap with arXiv:1802.00682

  5. arXiv:1805.11571  [pdf, other

    stat.ML cs.LG

    Human-in-the-Loop Interpretability Prior

    Authors: Isaac Lage, Andrew Slavin Ross, Been Kim, Samuel J. Gershman, Finale Doshi-Velez

    Abstract: We often desire our models to be interpretable as well as accurate. Prior work on optimizing models for interpretability has relied on easy-to-quantify proxies for interpretability, such as sparsity or the number of operations required. In this work, we optimize for interpretability by directly including humans in the optimization loop. We develop an algorithm that minimizes the number of user stu… ▽ More

    Submitted 30 October, 2018; v1 submitted 29 May, 2018; originally announced May 2018.

    Comments: To appear at NIPS 2018, selected for a spotlight. 13 pages (incl references and appendix)

  6. arXiv:1802.04302  [pdf, other

    cs.CL stat.ML

    Evaluating Compositionality in Sentence Embeddings

    Authors: Ishita Dasgupta, Demi Guo, Andreas Stuhlmüller, Samuel J. Gershman, Noah D. Goodman

    Abstract: An important challenge for human-like AI is compositional semantics. Recent research has attempted to address this by using deep neural networks to learn vector space embeddings of sentences, which then serve as input to other tasks. We present a new dataset for one such task, `natural language inference' (NLI), that cannot be solved using only word-level knowledge and requires some compositionali… ▽ More

    Submitted 17 May, 2018; v1 submitted 12 February, 2018; originally announced February 2018.

  7. arXiv:1711.01134  [pdf

    cs.AI stat.ML

    Accountability of AI Under the Law: The Role of Explanation

    Authors: Finale Doshi-Velez, Mason Kortz, Ryan Budish, Chris Bavitz, Sam Gershman, David O'Brien, Kate Scott, Stuart Schieber, James Waldo, David Weinberger, Adrian Weller, Alexandra Wood

    Abstract: The ubiquity of systems using artificial intelligence or "AI" has brought increasing attention to how those systems should be regulated. The choice of how to regulate AI systems will require care. AI systems have the potential to synthesize large amounts of data, allowing for greater levels of personalization and precision than ever before---applications range from clinical decision support to aut… ▽ More

    Submitted 20 December, 2019; v1 submitted 3 November, 2017; originally announced November 2017.

  8. arXiv:1606.02396  [pdf, other

    stat.ML cs.AI cs.LG cs.NE

    Deep Successor Reinforcement Learning

    Authors: Tejas D. Kulkarni, Ardavan Saeedi, Simanta Gautam, Samuel J. Gershman

    Abstract: Learning robust value functions given raw observations and rewards is now possible with model-free and model-based deep reinforcement learning algorithms. There is a third alternative, called Successor Representations (SR), which decomposes the value function into two components -- a reward predictor and a successor map. The successor map represents the expected future state occupancy from any giv… ▽ More

    Submitted 8 June, 2016; originally announced June 2016.

    Comments: 10 pages, 6 figures

  9. arXiv:1604.00289  [pdf, other

    cs.AI cs.CV cs.LG cs.NE stat.ML

    Building Machines That Learn and Think Like People

    Authors: Brenden M. Lake, Tomer D. Ullman, Joshua B. Tenenbaum, Samuel J. Gershman

    Abstract: Recent progress in artificial intelligence (AI) has renewed interest in building systems that learn and think like people. Many advances have come from using deep neural networks trained end-to-end in tasks such as object recognition, video games, and board games, achieving performance that equals or even beats humans in some respects. Despite their biological inspiration and performance achieveme… ▽ More

    Submitted 2 November, 2016; v1 submitted 1 April, 2016; originally announced April 2016.

    Comments: In press at Behavioral and Brain Sciences. Open call for commentary proposals (until Nov. 22, 2016). https://www.cambridge.org/core/journals/behavioral-and-brain-sciences/information/calls-for-commentary/open-calls-for-commentary

  10. arXiv:1604.00126  [pdf, other

    cs.CL cs.IR cs.LG stat.ML

    Nonparametric Spherical Topic Modeling with Word Embeddings

    Authors: Kayhan Batmanghelich, Ardavan Saeedi, Karthik Narasimhan, Sam Gershman

    Abstract: Traditional topic models do not account for semantic regularities in language. Recent distributional representations of words exhibit semantic consistency over directional metrics such as cosine similarity. However, neither categorical nor Gaussian observational distributions used in existing topic models are appropriate to leverage such correlations. In this paper, we propose to use the von Mises… ▽ More

    Submitted 1 April, 2016; originally announced April 2016.

  11. arXiv:1402.5715  [pdf, other

    stat.ML cs.LG

    Variational Particle Approximations

    Authors: Ardavan Saeedi, Tejas D Kulkarni, Vikash Mansinghka, Samuel Gershman

    Abstract: Approximate inference in high-dimensional, discrete probabilistic models is a central problem in computational statistics and machine learning. This paper describes discrete particle variational inference (DPVI), a new approach that combines key strengths of Monte Carlo, variational and search-based techniques. DPVI is based on a novel family of particle-based variational approximations that can b… ▽ More

    Submitted 5 December, 2015; v1 submitted 23 February, 2014; originally announced February 2014.

    Comments: First two authors contributed equally to this work

  12. arXiv:1206.4665  [pdf

    cs.LG stat.ML

    Nonparametric variational inference

    Authors: Samuel Gershman, Matt Hoffman, David Blei

    Abstract: Variational methods are widely used for approximate posterior inference. However, their use is typically limited to families of distributions that enjoy particular conjugacy properties. To circumvent this limitation, we propose a family of variational approximations inspired by nonparametric kernel density estimation. The locations of these kernels and their bandwidth are treated as variational pa… ▽ More

    Submitted 18 June, 2012; originally announced June 2012.

    Comments: ICML2012

  13. arXiv:1110.5454  [pdf, other

    stat.ML math.ST

    Distance Dependent Infinite Latent Feature Models

    Authors: Samuel J. Gershman, Peter I. Frazier, David M. Blei

    Abstract: Latent feature models are widely used to decompose data into a small number of components. Bayesian nonparametric variants of these models, which use the Indian buffet process (IBP) as a prior over latent features, allow the number of features to be determined from the data. We present a generalization of the IBP, the distance dependent Indian buffet process (dd-IBP), for modeling non-exchangeable… ▽ More

    Submitted 10 September, 2012; v1 submitted 25 October, 2011; originally announced October 2011.

    Comments: 28 pages, 9 figures

  14. arXiv:1106.2697  [pdf, other

    stat.ML stat.ME

    A Tutorial on Bayesian Nonparametric Models

    Authors: Samuel J. Gershman, David M. Blei

    Abstract: A key problem in statistical modeling is model selection, how to choose a model at an appropriate level of complexity. This problem appears in many settings, most prominently in choosing the number ofclusters in mixture models or the number of factors in factor analysis. In this tutorial we describe Bayesian nonparametric methods, a class of methods that side-steps this issue by allowing the data… ▽ More

    Submitted 4 August, 2011; v1 submitted 14 June, 2011; originally announced June 2011.

    Comments: 28 pages, 8 figures