Skip to main content

Showing 1–12 of 12 results for author: Rimell, L

.
  1. arXiv:2212.09686  [pdf, other

    cs.CL

    A Natural Bias for Language Generation Models

    Authors: Clara Meister, Wojciech Stokowiec, Tiago Pimentel, Lei Yu, Laura Rimell, Adhiguna Kuncoro

    Abstract: After just a few hundred training updates, a standard probabilistic model for language generation has likely not yet learnt many semantic or syntactic rules of natural language, making it difficult to estimate the probability distribution over next tokens. Yet around this point, these models have identified a simple, loss-minimising behaviour: to output the unigram distribution of the target train… ▽ More

    Submitted 23 June, 2023; v1 submitted 19 December, 2022; originally announced December 2022.

    Comments: Main conference paper at ACL 2023

  2. arXiv:2112.11446  [pdf, other

    cs.CL cs.AI

    Scaling Language Models: Methods, Analysis & Insights from Training Gopher

    Authors: Jack W. Rae, Sebastian Borgeaud, Trevor Cai, Katie Millican, Jordan Hoffmann, Francis Song, John Aslanides, Sarah Henderson, Roman Ring, Susannah Young, Eliza Rutherford, Tom Hennigan, Jacob Menick, Albin Cassirer, Richard Powell, George van den Driessche, Lisa Anne Hendricks, Maribeth Rauh, Po-Sen Huang, Amelia Glaese, Johannes Welbl, Sumanth Dathathri, Saffron Huang, Jonathan Uesato, John Mellor , et al. (55 additional authors not shown)

    Abstract: Language modelling provides a step towards intelligent communication systems by harnessing large repositories of written human knowledge to better predict and understand the world. In this paper, we present an analysis of Transformer-based language model performance across a wide range of model scales -- from models with tens of millions of parameters up to a 280 billion parameter model called Gop… ▽ More

    Submitted 21 January, 2022; v1 submitted 8 December, 2021; originally announced December 2021.

    Comments: 120 pages

  3. arXiv:2112.04359  [pdf, other

    cs.CL cs.AI cs.CY

    Ethical and social risks of harm from Language Models

    Authors: Laura Weidinger, John Mellor, Maribeth Rauh, Conor Griffin, Jonathan Uesato, Po-Sen Huang, Myra Cheng, Mia Glaese, Borja Balle, Atoosa Kasirzadeh, Zac Kenton, Sasha Brown, Will Hawkins, Tom Stepleton, Courtney Biles, Abeba Birhane, Julia Haas, Laura Rimell, Lisa Anne Hendricks, William Isaac, Sean Legassick, Geoffrey Irving, Iason Gabriel

    Abstract: This paper aims to help structure the risk landscape associated with large-scale Language Models (LMs). In order to foster advances in responsible innovation, an in-depth understanding of the potential risks posed by these models is needed. A wide range of established and anticipated risks are analysed in detail, drawing on multidisciplinary expertise and literature from computer science, linguist… ▽ More

    Submitted 8 December, 2021; originally announced December 2021.

  4. arXiv:2109.02550  [pdf, other

    cs.CL

    You should evaluate your language model on marginal likelihood over tokenisations

    Authors: Kris Cao, Laura Rimell

    Abstract: Neural language models typically tokenise input text into sub-word units to achieve an open vocabulary. The standard approach is to use a single canonical tokenisation at both train and test time. We suggest that this approach is unsatisfactory and may bottleneck our evaluation of language model performance. Using only the one-best tokenisation ignores tokeniser uncertainty over alternative tokeni… ▽ More

    Submitted 21 September, 2021; v1 submitted 6 September, 2021; originally announced September 2021.

    Comments: accepted at EMNLP 2021

  5. arXiv:2103.10518  [pdf, other

    cs.CL cs.AI cs.LG

    Pretraining the Noisy Channel Model for Task-Oriented Dialogue

    Authors: Qi Liu, Lei Yu, Laura Rimell, Phil Blunsom

    Abstract: Direct decoding for task-oriented dialogue is known to suffer from the explaining-away effect, manifested in models that prefer short and generic responses. Here we argue for the use of Bayes' theorem to factorize the dialogue task into two models, the distribution of the context given the response, and the prior for the response itself. This approach, an instantiation of the noisy channel model,… ▽ More

    Submitted 18 March, 2021; originally announced March 2021.

    Comments: Accepted to TACL, pre MIT Press publication version

  6. arXiv:2006.01016  [pdf, other

    cs.AI cs.CL cs.LG

    Probing Emergent Semantics in Predictive Agents via Question Answering

    Authors: Abhishek Das, Federico Carnevale, Hamza Merzic, Laura Rimell, Rosalia Schneider, Josh Abramson, Alden Hung, Arun Ahuja, Stephen Clark, Gregory Wayne, Felix Hill

    Abstract: Recent work has shown how predictive modeling can endow agents with rich knowledge of their surroundings, improving their ability to act in complex environments. We propose question-answering as a general paradigm to decode and understand the representations that such agents develop, applying our method to two recent approaches to predictive modeling -action-conditional CPC (Guo et al., 2018) and… ▽ More

    Submitted 1 June, 2020; originally announced June 2020.

    Comments: ICML 2020

  7. arXiv:2005.13482  [pdf, other

    cs.CL

    Syntactic Structure Distillation Pretraining For Bidirectional Encoders

    Authors: Adhiguna Kuncoro, Lingpeng Kong, Daniel Fried, Dani Yogatama, Laura Rimell, Chris Dyer, Phil Blunsom

    Abstract: Textual representation learners trained on large amounts of data have achieved notable success on downstream tasks; intriguingly, they have also performed well on challenging tests of syntactic competence. Given this success, it remains an open question whether scalable learners like BERT can become fully proficient in the syntax of natural language by virtue of data scale alone, or whether they s… ▽ More

    Submitted 27 May, 2020; originally announced May 2020.

    Comments: 17 pages, 6 tables, 2 figures. AK and LK contributed equally

  8. arXiv:1909.11049  [pdf, ps, other

    cs.CL

    Neural Generative Rhetorical Structure Parsing

    Authors: Amandla Mabona, Laura Rimell, Stephen Clark, Andreas Vlachos

    Abstract: Rhetorical structure trees have been shown to be useful for several document-level tasks including summarization and document classification. Previous approaches to RST parsing have used discriminative models; however, these are less sample efficient than generative models, and RST parsing datasets are typically small. In this paper, we present the first generative model for RST parsing. Our model… ▽ More

    Submitted 24 September, 2019; originally announced September 2019.

    Comments: EMNLP 2019

  9. arXiv:1906.06438  [pdf, other

    cs.CL cs.LG

    Scalable Syntax-Aware Language Models Using Knowledge Distillation

    Authors: Adhiguna Kuncoro, Chris Dyer, Laura Rimell, Stephen Clark, Phil Blunsom

    Abstract: Prior work has shown that, on small amounts of training data, syntactic neural language models learn structurally sensitive generalisations more successfully than sequential language models. However, their computational complexity renders scaling difficult, and it remains an open question whether structural biases are still necessary when sequential models have access to ever larger amounts of tra… ▽ More

    Submitted 14 June, 2019; originally announced June 2019.

    Comments: ACL 2019

  10. arXiv:1608.01018   

    cs.CL cs.AI math.CT quant-ph

    Proceedings of the 2016 Workshop on Semantic Spaces at the Intersection of NLP, Physics and Cognitive Science

    Authors: Dimitrios Kartsaklis, Martha Lewis, Laura Rimell

    Abstract: This volume contains the Proceedings of the 2016 Workshop on Semantic Spaces at the Intersection of NLP, Physics and Cognitive Science (SLPCS 2016), which was held on the 11th of June at the University of Strathclyde, Glasgow, and was co-located with Quantum Physics and Logic (QPL 2016). Exploiting the common ground provided by the concept of a vector space, the workshop brought together researche… ▽ More

    Submitted 2 August, 2016; originally announced August 2016.

    Journal ref: EPTCS 221, 2016

  11. arXiv:1509.01692  [pdf, other

    cs.CL

    Take and Took, Gaggle and Goose, Book and Read: Evaluating the Utility of Vector Differences for Lexical Relation Learning

    Authors: Ekaterina Vylomova, Laura Rimell, Trevor Cohn, Timothy Baldwin

    Abstract: Recent work on word embeddings has shown that simple vector subtraction over pre-trained embeddings is surprisingly effective at capturing different lexical relations, despite lacking explicit supervision. Prior work has evaluated this intriguing result using a word analogy prediction formulation and hand-selected relations, but the generality of the finding over a broader range of lexical relatio… ▽ More

    Submitted 13 August, 2016; v1 submitted 5 September, 2015; originally announced September 2015.

  12. arXiv:1411.7942  [pdf, other

    cs.CL

    Using Sentence Plausibility to Learn the Semantics of Transitive Verbs

    Authors: Tamara Polajnar, Laura Rimell, Stephen Clark

    Abstract: The functional approach to compositional distributional semantics considers transitive verbs to be linear maps that transform the distributional vectors representing nouns into a vector representing a sentence. We conduct an initial investigation that uses a matrix consisting of the parameters of a logistic regression classifier trained on a plausibility task as a transitive verb function. We comp… ▽ More

    Submitted 12 December, 2014; v1 submitted 28 November, 2014; originally announced November 2014.

    Comments: Full updated paper for NIPS learning semantics workshop, with some minor errata fixed