Skip to main content

Showing 1–19 of 19 results for author: Ash, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.19153  [pdf, other

    cs.LG cs.AI

    A Study of Plasticity Loss in On-Policy Deep Reinforcement Learning

    Authors: Arthur Juliani, Jordan T. Ash

    Abstract: Continual learning with deep neural networks presents challenges distinct from both the fixed-dataset and convex continual learning regimes. One such challenge is plasticity loss, wherein a neural network trained in an online fashion displays a degraded ability to fit new tasks. This problem has been extensively studied in both supervised learning and off-policy reinforcement learning (RL), where… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  2. arXiv:2401.06692  [pdf, other

    cs.CL cs.AI cs.LG

    An Experimental Design Framework for Label-Efficient Supervised Finetuning of Large Language Models

    Authors: Gantavya Bhatt, Yifang Chen, Arnav M. Das, Jifan Zhang, Sang T. Truong, Stephen Mussmann, Yinglun Zhu, Jeffrey Bilmes, Simon S. Du, Kevin Jamieson, Jordan T. Ash, Robert D. Nowak

    Abstract: Supervised finetuning (SFT) on instruction datasets has played a crucial role in achieving the remarkable zero-shot generalization capabilities observed in modern large language models (LLMs). However, the annotation efforts required to produce high quality responses for instructions are becoming prohibitively expensive, especially as the number of tasks spanned by instruction datasets continues t… ▽ More

    Submitted 6 May, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

  3. arXiv:2312.13558  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction

    Authors: Pratyusha Sharma, Jordan T. Ash, Dipendra Misra

    Abstract: Transformer-based Large Language Models (LLMs) have become a fixture in modern machine learning. Correspondingly, significant resources are allocated towards research that aims to further advance this technology, typically resulting in models of increasing size that are trained on increasing amounts of data. This work, however, demonstrates the surprising result that it is often possible to signif… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

  4. arXiv:2306.00946  [pdf, other

    cs.LG cs.CL

    Exposing Attention Glitches with Flip-Flop Language Modeling

    Authors: Bingbin Liu, Jordan T. Ash, Surbhi Goel, Akshay Krishnamurthy, Cyril Zhang

    Abstract: Why do large language models sometimes output factual inaccuracies and exhibit erroneous reasoning? The brittleness of these models, particularly when executing long chains of reasoning, currently seems to be an inevitable price to pay for their advanced capabilities of coherently synthesizing knowledge, pragmatics, and abstract thought. Towards making sense of this fundamentally unsolved problem,… ▽ More

    Submitted 30 October, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

    Comments: v2: NeurIPS 2023 camera-ready + data release

  5. arXiv:2303.02535  [pdf, other

    cs.LG

    Streaming Active Learning with Deep Neural Networks

    Authors: Akanksha Saran, Safoora Yousefi, Akshay Krishnamurthy, John Langford, Jordan T. Ash

    Abstract: Active learning is perhaps most naturally posed as an online learning problem. However, prior active learning approaches with deep neural networks assume offline access to the entire dataset ahead of time. This paper proposes VeSSAL, a new algorithm for batch active learning with deep neural networks in streaming settings, which samples groups of points to query for labels at the moment they are e… ▽ More

    Submitted 6 June, 2023; v1 submitted 4 March, 2023; originally announced March 2023.

    Comments: ICML 2023

  6. arXiv:2211.00928  [pdf, ps, other

    cs.LG cs.AI

    Neural Active Learning on Heteroskedastic Distributions

    Authors: Savya Khosla, Chew Kin Whye, Jordan T. Ash, Cyril Zhang, Kenji Kawaguchi, Alex Lamb

    Abstract: Models that can actively seek out the best quality training data hold the promise of more accurate, adaptable, and efficient machine learning. Active learning techniques often tend to prefer examples that are the most difficult to classify. While this works well on homogeneous datasets, we find that it can lead to catastrophic failures when performed on multiple distributions with different degree… ▽ More

    Submitted 23 July, 2023; v1 submitted 2 November, 2022; originally announced November 2022.

  7. arXiv:2210.14077  [pdf, other

    cs.LG

    Eigen Memory Trees

    Authors: Mark Rucker, Jordan T. Ash, John Langford, Paul Mineiro, Ida Momennejad

    Abstract: This work introduces the Eigen Memory Tree (EMT), a novel online memory model for sequential learning scenarios. EMTs store data at the leaves of a binary tree and route new samples through the structure using the principal components of previous experiences, facilitating efficient (logarithmic) access to relevant memories. We demonstrate that EMT outperforms existing online memory approaches, and… ▽ More

    Submitted 31 October, 2022; v1 submitted 25 October, 2022; originally announced October 2022.

    Comments: corrected an author name; corrected title plurality

  8. arXiv:2210.10749  [pdf, other

    cs.LG cs.FL stat.ML

    Transformers Learn Shortcuts to Automata

    Authors: Bingbin Liu, Jordan T. Ash, Surbhi Goel, Akshay Krishnamurthy, Cyril Zhang

    Abstract: Algorithmic reasoning requires capabilities which are most naturally understood through recurrent models of computation, like the Turing machine. However, Transformer models, while lacking recurrence, are able to perform such reasoning using far fewer layers than the number of reasoning steps. This raises the question: what solutions are learned by these shallow and non-recurrent models? We find t… ▽ More

    Submitted 2 May, 2023; v1 submitted 19 October, 2022; originally announced October 2022.

  9. arXiv:2110.11202  [pdf, other

    cs.LG

    Anti-Concentrated Confidence Bonuses for Scalable Exploration

    Authors: Jordan T. Ash, Cyril Zhang, Surbhi Goel, Akshay Krishnamurthy, Sham Kakade

    Abstract: Intrinsic rewards play a central role in handling the exploration-exploitation trade-off when designing sequential decision-making algorithms, in both foundational theory and state-of-the-art deep reinforcement learning. The LinUCB algorithm, a centerpiece of the stochastic linear bandits literature, prescribes an elliptical bonus which addresses the challenge of leveraging shared information in l… ▽ More

    Submitted 11 April, 2022; v1 submitted 21 October, 2021; originally announced October 2021.

    Journal ref: International Conference on Learning Representations 2022

  10. arXiv:2106.09943  [pdf, other

    cs.LG cs.CL stat.ML

    Investigating the Role of Negatives in Contrastive Representation Learning

    Authors: Jordan T. Ash, Surbhi Goel, Akshay Krishnamurthy, Dipendra Misra

    Abstract: Noise contrastive learning is a popular technique for unsupervised representation learning. In this approach, a representation is obtained via reduction to supervised learning, where given a notion of semantic similarity, the learner tries to distinguish a similar (positive) example from a collection of random (negative) examples. The success of modern contrastive learning pipelines relies on many… ▽ More

    Submitted 18 June, 2021; originally announced June 2021.

  11. arXiv:2106.09675  [pdf, other

    cs.LG

    Gone Fishing: Neural Active Learning with Fisher Embeddings

    Authors: Jordan T. Ash, Surbhi Goel, Akshay Krishnamurthy, Sham Kakade

    Abstract: There is an increasing need for effective active learning algorithms that are compatible with deep neural networks. This paper motivates and revisits a classic, Fisher-based active selection objective, and proposes BAIT, a practical, tractable, and high-performing algorithm that makes it viable for use with neural models. BAIT draws inspiration from the theoretical analysis of maximum likelihood e… ▽ More

    Submitted 14 December, 2021; v1 submitted 17 June, 2021; originally announced June 2021.

  12. arXiv:2005.06549  [pdf, other

    cs.LG stat.ML

    Learning Composable Energy Surrogates for PDE Order Reduction

    Authors: Alex Beatson, Jordan T. Ash, Geoffrey Roeder, Tianju Xue, Ryan P. Adams

    Abstract: Meta-materials are an important emerging class of engineered materials in which complex macroscopic behaviour--whether electromagnetic, thermal, or mechanical--arises from modular substructure. Simulation and optimization of these materials are computationally challenging, as rich substructures necessitate high-fidelity finite element meshes to solve the governing PDEs. To address this, we leverag… ▽ More

    Submitted 15 May, 2020; v1 submitted 13 May, 2020; originally announced May 2020.

  13. arXiv:2002.08111  [pdf, other

    cs.LG cs.CV cs.NE stat.ML

    Hierarchical Quantized Autoencoders

    Authors: Will Williams, Sam Ringer, Tom Ash, John Hughes, David MacLeod, Jamie Dougherty

    Abstract: Despite progress in training neural networks for lossy image compression, current approaches fail to maintain both perceptual quality and abstract features at very low bitrates. Encouraged by recent success in learning discrete representations with Vector Quantized Variational Autoencoders (VQ-VAEs), we motivate the use of a hierarchy of VQ-VAEs to attain high factors of compression. We show that… ▽ More

    Submitted 16 October, 2020; v1 submitted 19 February, 2020; originally announced February 2020.

  14. arXiv:1910.08519  [pdf, other

    cs.LG cs.CV stat.ML

    Texture Bias Of CNNs Limits Few-Shot Classification Performance

    Authors: Sam Ringer, Will Williams, Tom Ash, Remi Francis, David MacLeod

    Abstract: Accurate image classification given small amounts of labelled data (few-shot classification) remains an open problem in computer vision. In this work we examine how the known texture bias of Convolutional Neural Networks (CNNs) affects few-shot classification performance. Although texture bias can help in standard image classification, in this work we show it significantly harms few-shot classific… ▽ More

    Submitted 18 October, 2019; originally announced October 2019.

  15. arXiv:1910.08475  [pdf, other

    cs.LG cs.NE stat.ML

    On Warm-Starting Neural Network Training

    Authors: Jordan T. Ash, Ryan P. Adams

    Abstract: In many real-world deployments of machine learning systems, data arrive piecemeal. These learning scenarios may be passive, where data arrive incrementally due to structural properties of the problem (e.g., daily financial data) or active, where samples are selected according to a measure of their quality (e.g., experimental design). In both of these cases, we are building a sequence of models tha… ▽ More

    Submitted 31 December, 2020; v1 submitted 18 October, 2019; originally announced October 2019.

    Journal ref: 2020 Advances in Neural Information Processing Systems

  16. arXiv:1906.03671  [pdf, other

    cs.LG stat.ML

    Deep Batch Active Learning by Diverse, Uncertain Gradient Lower Bounds

    Authors: Jordan T. Ash, Chicheng Zhang, Akshay Krishnamurthy, John Langford, Alekh Agarwal

    Abstract: We design a new algorithm for batch active learning with deep neural network models. Our algorithm, Batch Active learning by Diverse Gradient Embeddings (BADGE), samples groups of points that are disparate and high-magnitude when represented in a hallucinated gradient space, a strategy designed to incorporate both predictive uncertainty and sample diversity into every selected batch. Crucially, BA… ▽ More

    Submitted 23 February, 2020; v1 submitted 9 June, 2019; originally announced June 2019.

    Journal ref: 2020 International Conference on Learning Representations

  17. arXiv:1811.02528  [pdf, other

    cs.CL cs.LG

    Discriminative training of RNNLMs with the average word error criterion

    Authors: RĂ©mi Francis, Tom Ash, Will Williams

    Abstract: In automatic speech recognition (ASR), recurrent neural language models (RNNLM) are typically used to refine hypotheses in the form of lattices or n-best lists, which are generated by a beam search decoder with a weaker language model. The RNNLMs are usually trained generatively using the perplexity (PPL) criterion on large corpora of grammatically correct text. However, the hypotheses are noisy,… ▽ More

    Submitted 8 November, 2018; v1 submitted 6 November, 2018; originally announced November 2018.

    Comments: Sumbitted to ICASSP 2019

  18. arXiv:1602.04889  [pdf, other

    cs.LG cs.AI

    Unsupervised Domain Adaptation Using Approximate Label Matching

    Authors: Jordan T. Ash, Robert E. Schapire, Barbara E. Engelhardt

    Abstract: Domain adaptation addresses the problem created when training data is generated by a so-called source distribution, but test data is generated by a significantly different target distribution. In this work, we present approximate label matching (ALM), a new unsupervised domain adaptation technique that creates and leverages a rough labeling on the test samples, then uses these noisy labels to lear… ▽ More

    Submitted 1 March, 2017; v1 submitted 15 February, 2016; originally announced February 2016.

  19. arXiv:1502.00512  [pdf, other

    cs.CL cs.LG

    Scaling Recurrent Neural Network Language Models

    Authors: Will Williams, Niranjani Prasad, David Mrva, Tom Ash, Tony Robinson

    Abstract: This paper investigates the scaling properties of Recurrent Neural Network Language Models (RNNLMs). We discuss how to train very large RNNs on GPUs and address the questions of how RNNLMs scale with respect to model size, training-set size, computational costs and memory. Our analysis shows that despite being more costly to train, RNNLMs obtain much lower perplexities on standard benchmarks than… ▽ More

    Submitted 2 February, 2015; originally announced February 2015.