Skip to main content

Showing 1–23 of 23 results for author: Bergen, B K

.
  1. arXiv:2406.14737  [pdf, other

    cs.CL

    Dissecting the Ullman Variations with a SCALPEL: Why do LLMs fail at Trivial Alterations to the False Belief Task?

    Authors: Zhiqiang Pi, Annapurna Vadaparty, Benjamin K. Bergen, Cameron R. Jones

    Abstract: Recent empirical results have sparked a debate about whether or not Large Language Models (LLMs) are capable of Theory of Mind (ToM). While some have found LLMs to be successful on ToM evaluations such as the False Belief task (Kosinski, 2023), others have argued that LLMs solve these tasks by exploiting spurious correlations -- not representing beliefs -- since they fail on trivial alterations to… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  2. arXiv:2405.08007  [pdf, other

    cs.HC cs.AI

    People cannot distinguish GPT-4 from a human in a Turing test

    Authors: Cameron R. Jones, Benjamin K. Bergen

    Abstract: We evaluated 3 systems (ELIZA, GPT-3.5 and GPT-4) in a randomized, controlled, and preregistered Turing test. Human participants had a 5 minute conversation with either a human or an AI, and judged whether or not they thought their interlocutor was human. GPT-4 was judged to be a human 54% of the time, outperforming ELIZA (22%) but lagging behind actual humans (67%). The results provide the first… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: 23 pages, 13 figures

  3. arXiv:2404.19178  [pdf, other

    cs.CL

    Revenge of the Fallen? Recurrent Models Match Transformers at Predicting Human Language Comprehension Metrics

    Authors: James A. Michaelov, Catherine Arnett, Benjamin K. Bergen

    Abstract: Transformers have supplanted Recurrent Neural Networks as the dominant architecture for both natural language processing tasks and, despite criticisms of cognitive implausibility, for modelling the effect of predictability on online human language comprehension. However, two recently developed recurrent neural network architectures, RWKV and Mamba, appear to perform natural language tasks comparab… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  4. arXiv:2403.00686  [pdf, other

    cs.CL

    A Bit of a Problem: Measurement Disparities in Dataset Sizes Across Languages

    Authors: Catherine Arnett, Tyler A. Chang, Benjamin K. Bergen

    Abstract: How should text dataset sizes be compared across languages? Even for content-matched (parallel) corpora, UTF-8 encoded text can require a dramatically different number of bytes for different languages. In our work, we define the byte premium between two languages as the ratio of bytes used to encode content-matched text in those languages. We compute byte premiums for 1155 languages, and we use li… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

  5. arXiv:2311.09205  [pdf, other

    cs.CL

    When Is Multilinguality a Curse? Language Modeling for 250 High- and Low-Resource Languages

    Authors: Tyler A. Chang, Catherine Arnett, Zhuowen Tu, Benjamin K. Bergen

    Abstract: Multilingual language models are widely used to extend NLP systems to low-resource languages. However, concrete evidence for the effects of multilinguality on language modeling performance in individual languages remains scarce. Here, we pre-train over 10,000 monolingual and multilingual language models for over 250 languages, including multiple language families that are under-studied in NLP. We… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

  6. arXiv:2311.09194  [pdf, other

    cs.CL

    Structural Priming Demonstrates Abstract Grammatical Representations in Multilingual Language Models

    Authors: James A. Michaelov, Catherine Arnett, Tyler A. Chang, Benjamin K. Bergen

    Abstract: Abstract grammatical knowledge - of parts of speech and grammatical patterns - is key to the capacity for linguistic generalization in humans. But how abstract is grammatical knowledge in large language models? In the human literature, compelling evidence for grammatical abstraction comes from structural priming. A sentence that shares the same grammatical structure as a preceding sentence is proc… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

    Comments: Accepted at EMNLP 2023

  7. arXiv:2310.20216  [pdf, other

    cs.AI cs.CL

    Does GPT-4 pass the Turing test?

    Authors: Cameron R. Jones, Benjamin K. Bergen

    Abstract: We evaluated GPT-4 in a public online Turing test. The best-performing GPT-4 prompt passed in 49.7% of games, outperforming ELIZA (22%) and GPT-3.5 (20%), but falling short of the baseline set by human participants (66%). Participants' decisions were based mainly on linguistic style (35%) and socioemotional traits (27%), supporting the idea that intelligence, narrowly conceived, is not sufficient… ▽ More

    Submitted 20 April, 2024; v1 submitted 31 October, 2023; originally announced October 2023.

    Comments: 28 pages, 21 figures

  8. arXiv:2310.07929  [pdf, other

    cs.CL

    Crosslingual Structural Priming and the Pre-Training Dynamics of Bilingual Language Models

    Authors: Catherine Arnett, Tyler A. Chang, James A. Michaelov, Benjamin K. Bergen

    Abstract: Do multilingual language models share abstract grammatical representations across languages, and if so, when do these develop? Following Sinclair et al. (2022), we use structural priming to test for abstract grammatical representations with causal effects on model outputs. We extend the approach to a Dutch-English bilingual setting, and we evaluate a Dutch-English language model during pre-trainin… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

    Comments: Extended abstract accepted to the 3rd Multilingual Representation Learning workshop at EMNLP 2023

  9. arXiv:2308.15419  [pdf, other

    cs.CL

    Characterizing Learning Curves During Language Model Pre-Training: Learning, Forgetting, and Stability

    Authors: Tyler A. Chang, Zhuowen Tu, Benjamin K. Bergen

    Abstract: How do language models learn to make predictions during pre-training? To study this question, we extract learning curves from five autoregressive English language model pre-training runs, for 1M tokens in context. We observe that the language models generate short repetitive phrases before learning to generate longer and more coherent text. We quantify the final surprisal, within-run variability,… ▽ More

    Submitted 29 August, 2023; originally announced August 2023.

  10. arXiv:2305.14681  [pdf, other

    cs.CL

    Emergent inabilities? Inverse scaling over the course of pretraining

    Authors: James A. Michaelov, Benjamin K. Bergen

    Abstract: Does inverse scaling only occur as a function of model size, or can it also occur over the course of training? We carry out an exploratory study investigating whether the performance of language models on specific tasks can decrease (while general performance remains high) during training on the language modeling task. We find 8 tasks on which Pythia 12B (Biderman et al., 2023) shows decreased per… ▽ More

    Submitted 15 November, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: Accepted to Findings of EMNLP 2023

  11. arXiv:2303.11504  [pdf, ps, other

    cs.CL

    Language Model Behavior: A Comprehensive Survey

    Authors: Tyler A. Chang, Benjamin K. Bergen

    Abstract: Transformer language models have received widespread public attention, yet their generated text is often surprising even to NLP researchers. In this survey, we discuss over 250 recent studies of English language model behavior before task-specific fine-tuning. Language models possess basic capabilities in syntax, semantics, pragmatics, world knowledge, and reasoning, but these capabilities are sen… ▽ More

    Submitted 25 August, 2023; v1 submitted 20 March, 2023; originally announced March 2023.

    Comments: 32 pages, accepted to Computational Linguistics

  12. arXiv:2301.08731  [pdf, other

    cs.CL

    Can Peanuts Fall in Love with Distributional Semantics?

    Authors: James A. Michaelov, Seana Coulson, Benjamin K. Bergen

    Abstract: Context changes expectations about upcoming words - following a story involving an anthropomorphic peanut, comprehenders expect the sentence the peanut was in love more than the peanut was salted, as indexed by N400 amplitude (Nieuwland & van Berkum, 2006). This updating of expectations has been explained using Situation Models - mental representations of a described event. However, recent work sh… ▽ More

    Submitted 22 May, 2023; v1 submitted 20 January, 2023; originally announced January 2023.

    Comments: Accepted at CogSci 2023

  13. arXiv:2212.08700  [pdf, other

    cs.CL cs.AI cs.LG

    Rarely a problem? Language models exhibit inverse scaling in their predictions following few-type quantifiers

    Authors: James A. Michaelov, Benjamin K. Bergen

    Abstract: How well do language models deal with quantification? In this study, we focus on 'few'-type quantifiers, as in 'few children like toys', which might pose a particular challenge for language models because the sentence components with out the quantifier are likely to co-occur, and 'few'-type quantifiers are rare. We present 960 English sentence stimuli from two human neurolinguistic experiments to… ▽ More

    Submitted 26 May, 2023; v1 submitted 16 December, 2022; originally announced December 2022.

    Comments: Accepted to Findings of ACL 2023

  14. arXiv:2211.05198  [pdf, other

    cs.CL cs.AI

    Collateral facilitation in humans and language models

    Authors: James A. Michaelov, Benjamin K. Bergen

    Abstract: Are the predictions of humans and language models affected by similar things? Research suggests that while comprehending language, humans make predictions about upcoming words, with more predictable words being processed more easily. However, evidence also shows that humans display a similar processing advantage for highly anomalous words when these words are semantically related to the preceding… ▽ More

    Submitted 9 November, 2022; originally announced November 2022.

    Comments: Accepted at CoNLL 2022

  15. arXiv:2208.14554  [pdf, other

    cs.CL cs.AI cs.IT cs.LG

    Do language models make human-like predictions about the coreferents of Italian anaphoric zero pronouns?

    Authors: James A. Michaelov, Benjamin K. Bergen

    Abstract: Some languages allow arguments to be omitted in certain contexts. Yet human language comprehenders reliably infer the intended referents of these zero pronouns, in part because they construct expectations about which referents are more likely. We ask whether Neural Language Models also extract the same expectations. We test whether 12 contemporary language models display expectations that reflect… ▽ More

    Submitted 3 October, 2022; v1 submitted 30 August, 2022; originally announced August 2022.

    Comments: Accepted at COLING 2022

  16. arXiv:2205.10964  [pdf, other

    cs.CL

    The Geometry of Multilingual Language Model Representations

    Authors: Tyler A. Chang, Zhuowen Tu, Benjamin K. Bergen

    Abstract: We assess how multilingual language models maintain a shared multilingual representation space while still encoding language-sensitive information in each language. Using XLM-R as a case study, we show that languages occupy similar linear subspaces after mean-centering, evaluated based on causal effects on language modeling performance and direct comparisons between subspaces for 88 languages. The… ▽ More

    Submitted 21 October, 2022; v1 submitted 22 May, 2022; originally announced May 2022.

    Comments: Accepted to EMNLP 2022

  17. arXiv:2110.02406  [pdf, other

    cs.CL

    Word Acquisition in Neural Language Models

    Authors: Tyler A. Chang, Benjamin K. Bergen

    Abstract: We investigate how neural language models acquire individual words during training, extracting learning curves and ages of acquisition for over 600 words on the MacArthur-Bates Communicative Development Inventory (Fenson et al., 2007). Drawing on studies of word acquisition in children, we evaluate multiple predictors for words' ages of acquisition in LSTMs, BERT, and GPT-2. We find that the effec… ▽ More

    Submitted 5 October, 2021; originally announced October 2021.

    Comments: Accepted to TACL (pre-MIT Press version)

  18. arXiv:2109.01226  [pdf, other

    cs.CL cs.AI cs.IT cs.LG

    So Cloze yet so Far: N400 Amplitude is Better Predicted by Distributional Information than Human Predictability Judgements

    Authors: James A. Michaelov, Seana Coulson, Benjamin K. Bergen

    Abstract: More predictable words are easier to process - they are read faster and elicit smaller neural signals associated with processing difficulty, most notably, the N400 component of the event-related brain potential. Thus, it has been argued that prediction of upcoming words is a key component of language comprehension, and that studying the amplitude of the N400 is a valuable way to investigate the pr… ▽ More

    Submitted 25 May, 2022; v1 submitted 2 September, 2021; originally announced September 2021.

    Comments: Accepted

    Journal ref: IEEE Transactions on Cognitive and Developmental Systems (2022)

  19. arXiv:2107.09648  [pdf, other

    cs.CL cs.AI cs.LG

    Different kinds of cognitive plausibility: why are transformers better than RNNs at predicting N400 amplitude?

    Authors: James A. Michaelov, Megan D. Bardolph, Seana Coulson, Benjamin K. Bergen

    Abstract: Despite being designed for performance rather than cognitive plausibility, transformer language models have been found to be better at predicting metrics used to assess human language comprehension than language models with other architectures, such as recurrent neural networks. Based on how well they predict the N400, a neural signal associated with processing difficulty, we propose and provide e… ▽ More

    Submitted 20 July, 2021; originally announced July 2021.

    Journal ref: Proceedings of the 43rd Annual Meeting of the Cognitive Science Society (2021) 300-306

  20. arXiv:2010.04844  [pdf, other

    cs.CL cs.AI cs.IT cs.LG q-bio.NC

    How well does surprisal explain N400 amplitude under different experimental conditions?

    Authors: James A. Michaelov, Benjamin K. Bergen

    Abstract: We investigate the extent to which word surprisal can be used to predict a neural measure of human language processing difficulty - the N400. To do this, we use recurrent neural networks to calculate the surprisal of stimuli from previously published neurolinguistic studies of the N400. We find that surprisal can predict N400 amplitude in a wide range of cases, and the cases where it cannot do so… ▽ More

    Submitted 9 October, 2020; originally announced October 2020.

    Comments: To be presented at CoNLL 2020

    Journal ref: Proceedings of the 24th Conference on Computational Natural Language Learning (2020) 652-663

  21. arXiv:2007.03097  [pdf, other

    physics.comp-ph astro-ph.IM cs.DC

    FleCSPH: The Next Generation FleCSIble Parallel Computational Infrastructure for Smoothed Particle Hydrodynamics

    Authors: Julien Loiseau, Hyun Lim, Mark Alexander Kaltenborn, Oleg Korobkin, Christopher M. Mauney, Irina Sagert, Wesley P. Even, Benjamin K. Bergen

    Abstract: FleCSPH is a smoothed particle hydrodynamics simulation tool, based on the compile-time configurable framework FleCSI. The asynchronous distributed tree topology combined with a fast multipole method allows FleCSPH to efficiently compute hydrodynamics and long range particle-particle interactions. FleCSPH provides initial data generators, particle relaxation techniques, and standard evolution driv… ▽ More

    Submitted 6 July, 2020; originally announced July 2020.

  22. arXiv:1312.4991  [pdf, ps, other

    physics.plasm-ph

    On the velocity space discretization for the Vlasov-Poisson system: comparison between Hermite spectral and Particle-in-Cell methods. Part 2: fully-implicit scheme

    Authors: E. Camporeale, G. L. Delzanno, B. K. Bergen, J. D. Moulton

    Abstract: We describe a spectral method for the numerical solution of the Vlasov-Poisson system where the velocity space is decomposed by means of an Hermite basis, and the configuration space is discretized via a Fourier decomposition. The novelty of our approach is an implicit time discretization that allows exact conservation of charge, momentum and energy. The computational efficiency and the cost-effec… ▽ More

    Submitted 17 December, 2013; originally announced December 2013.

    Comments: submitted to Journal of Computational Physics 16 pages, 7 figures. arXiv admin note: text overlap with arXiv:1311.2098

  23. arXiv:1311.2098  [pdf, ps, other

    physics.plasm-ph physics.comp-ph

    On the velocity space discretization for the Vlasov-Poisson system: comparison between Hermite spectral and Particle-in-Cell methods. Part 1: semi-implicit scheme

    Authors: Enrico Camporeale, Gian Luca Delzanno, Benjamin K. Bergen, J. David Moulton

    Abstract: We discuss a spectral method for the numerical solution of the Vlasov-Poisson system where the velocity space is decomposed by means of an Hermite basis. We describe a semi-implicit time discretization that extends the range of numerical stability relative to an explicit scheme. We also introduce and discuss the effects of an artificial collisional operator, which is necessary to take care of the… ▽ More

    Submitted 17 December, 2013; v1 submitted 8 November, 2013; originally announced November 2013.

    Comments: 29 pages; 13 figures; submitted to Journal of Computational Physics