Skip to main content

Showing 1–12 of 12 results for author: Frosst, N

.
  1. arXiv:2405.15032  [pdf, other

    cs.CL

    Aya 23: Open Weight Releases to Further Multilingual Progress

    Authors: Viraat Aryabumi, John Dang, Dwarak Talupuru, Saurabh Dash, David Cairuz, Hangyu Lin, Bharat Venkitesh, Madeline Smith, Jon Ander Campos, Yi Chern Tan, Kelly Marchisio, Max Bartolo, Sebastian Ruder, Acyr Locatelli, Julia Kreutzer, Nick Frosst, Aidan Gomez, Phil Blunsom, Marzieh Fadaee, Ahmet Üstün, Sara Hooker

    Abstract: This technical report introduces Aya 23, a family of multilingual language models. Aya 23 builds on the recent release of the Aya model (Üstün et al., 2024), focusing on pairing a highly performant pre-trained model with the recently released Aya collection (Singh et al., 2024). The result is a powerful multilingual large language model serving 23 languages, expanding state-of-art language modelin… ▽ More

    Submitted 31 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

  2. arXiv:2110.12609  [pdf, other

    cs.CL cs.LG

    No News is Good News: A Critique of the One Billion Word Benchmark

    Authors: Helen Ngo, João G. M. Araújo, Jeffrey Hui, Nicholas Frosst

    Abstract: The One Billion Word Benchmark is a dataset derived from the WMT 2011 News Crawl, commonly used to measure language modeling ability in natural language processing. We train models solely on Common Crawl web scrapes partitioned by year, and demonstrate that they perform worse on this task over time due to distributional shift. Analysis of this corpus reveals that it contains several examples of ha… ▽ More

    Submitted 24 October, 2021; originally announced October 2021.

  3. arXiv:2108.07790  [pdf, other

    cs.CL cs.LG

    Mitigating harm in language models with conditional-likelihood filtration

    Authors: Helen Ngo, Cooper Raterink, João G. M. Araújo, Ivan Zhang, Carol Chen, Adrien Morisot, Nicholas Frosst

    Abstract: Language models trained on large-scale unfiltered datasets curated from the open web acquire systemic biases, prejudices, and harmful views from their training data. We present a methodology for programmatically identifying and removing harmful text from web-scale datasets. A pretrained language model is used to calculate the log-likelihood of researcher-written trigger phrases conditioned on a sp… ▽ More

    Submitted 27 November, 2021; v1 submitted 4 August, 2021; originally announced August 2021.

  4. arXiv:2010.04116  [pdf, other

    cs.LG cs.AI

    Interlocking Backpropagation: Improving depthwise model-parallelism

    Authors: Aidan N. Gomez, Oscar Key, Kuba Perlin, Stephen Gou, Nick Frosst, Jeff Dean, Yarin Gal

    Abstract: The number of parameters in state of the art neural networks has drastically increased in recent years. This surge of interest in large scale neural networks has motivated the development of new distributed training strategies enabling such models. One such strategy is model-parallel distributed training. Unfortunately, model-parallelism can suffer from poor resource utilisation, which leads to wa… ▽ More

    Submitted 7 July, 2022; v1 submitted 8 October, 2020; originally announced October 2020.

  5. arXiv:2004.13912  [pdf, other

    cs.LG cs.AI stat.ML

    Neural Additive Models: Interpretable Machine Learning with Neural Nets

    Authors: Rishabh Agarwal, Levi Melnick, Nicholas Frosst, Xuezhou Zhang, Ben Lengerich, Rich Caruana, Geoffrey Hinton

    Abstract: Deep neural networks (DNNs) are powerful black-box predictors that have achieved impressive performance on a wide variety of tasks. However, their accuracy comes at the cost of intelligibility: it is usually unclear how they make their decisions. This hinders their applicability to high stakes decision-making domains such as healthcare. We propose Neural Additive Models (NAMs) which combine some o… ▽ More

    Submitted 24 October, 2021; v1 submitted 28 April, 2020; originally announced April 2020.

    Comments: Spotlight (Top 3%) at NeurIPS 2021

  6. arXiv:2002.07405  [pdf, other

    cs.LG cs.CV stat.ML

    Deflecting Adversarial Attacks

    Authors: Yao Qin, Nicholas Frosst, Colin Raffel, Garrison Cottrell, Geoffrey Hinton

    Abstract: There has been an ongoing cycle where stronger defenses against adversarial attacks are subsequently broken by a more advanced defense-aware attack. We present a new approach towards ending this cycle where we "deflect'' adversarial attacks by causing the attacker to produce an input that semantically resembles the attack's target class. To this end, we first propose a stronger defense based on Ca… ▽ More

    Submitted 18 February, 2020; originally announced February 2020.

  7. arXiv:1907.02957  [pdf, other

    cs.LG cs.CR cs.CV stat.ML

    Detecting and Diagnosing Adversarial Images with Class-Conditional Capsule Reconstructions

    Authors: Yao Qin, Nicholas Frosst, Sara Sabour, Colin Raffel, Garrison Cottrell, Geoffrey Hinton

    Abstract: Adversarial examples raise questions about whether neural network models are sensitive to the same visual features as humans. In this paper, we first detect adversarial examples or otherwise corrupted images based on a class-conditional reconstruction of the input. To specifically attack our detection mechanism, we propose the Reconstructive Attack which seeks both to cause a misclassification and… ▽ More

    Submitted 18 February, 2020; v1 submitted 5 July, 2019; originally announced July 2019.

    Journal ref: ICLR 2020

  8. arXiv:1902.01889  [pdf, other

    stat.ML cs.LG

    Analyzing and Improving Representations with the Soft Nearest Neighbor Loss

    Authors: Nicholas Frosst, Nicolas Papernot, Geoffrey Hinton

    Abstract: We explore and expand the $\textit{Soft Nearest Neighbor Loss}$ to measure the $\textit{entanglement}$ of class manifolds in representation space: i.e., how close pairs of points from the same class are relative to pairs of points from different classes. We demonstrate several use cases of the loss. As an analytical tool, it provides insights into the evolution of class similarity structures durin… ▽ More

    Submitted 5 February, 2019; originally announced February 2019.

  9. arXiv:1812.08848  [pdf, other

    cs.CV

    SMILER: Saliency Model Implementation Library for Experimental Research

    Authors: Calden Wloka, Toni Kunić, Iuliia Kotseruba, Ramin Fahimi, Nicholas Frosst, Neil D. B. Bruce, John K. Tsotsos

    Abstract: The Saliency Model Implementation Library for Experimental Research (SMILER) is a new software package which provides an open, standardized, and extensible framework for maintaining and executing computational saliency models. This work drastically reduces the human effort required to apply saliency algorithms to new tasks and datasets, while also ensuring consistency and procedural correctness fo… ▽ More

    Submitted 20 December, 2018; originally announced December 2018.

  10. arXiv:1811.06969  [pdf, other

    cs.LG cs.CR cs.CV stat.ML

    DARCCC: Detecting Adversaries by Reconstruction from Class Conditional Capsules

    Authors: Nicholas Frosst, Sara Sabour, Geoffrey Hinton

    Abstract: We present a simple technique that allows capsule models to detect adversarial images. In addition to being trained to classify images, the capsule model is trained to reconstruct the images from the pose parameters and identity of the correct top-level capsule. Adversarial images do not look like a typical member of the predicted class and they have much larger reconstruction errors when the reco… ▽ More

    Submitted 16 November, 2018; originally announced November 2018.

    Comments: To be presented at NIPS 2018 Workshop on Security in Machine Learning

  11. arXiv:1711.09784  [pdf, other

    cs.LG cs.AI stat.ML

    Distilling a Neural Network Into a Soft Decision Tree

    Authors: Nicholas Frosst, Geoffrey Hinton

    Abstract: Deep neural networks have proved to be a very effective way to perform classification tasks. They excel when the input data is high dimensional, the relationship between the input and the output is complicated, and the number of labeled training examples is large. But it is hard to explain why a learned network makes a particular classification decision on a particular test case. This is due to th… ▽ More

    Submitted 27 November, 2017; originally announced November 2017.

    Comments: presented at the CEX workshop at AI*IA 2017 conference

  12. arXiv:1710.09829  [pdf, other

    cs.CV

    Dynamic Routing Between Capsules

    Authors: Sara Sabour, Nicholas Frosst, Geoffrey E Hinton

    Abstract: A capsule is a group of neurons whose activity vector represents the instantiation parameters of a specific type of entity such as an object or an object part. We use the length of the activity vector to represent the probability that the entity exists and its orientation to represent the instantiation parameters. Active capsules at one level make predictions, via transformation matrices, for the… ▽ More

    Submitted 7 November, 2017; v1 submitted 26 October, 2017; originally announced October 2017.