Skip to main content

Showing 1–20 of 20 results for author: Jumelet, J

.
  1. arXiv:2407.02136  [pdf, other

    cs.CL

    Black Big Boxes: Do Language Models Hide a Theory of Adjective Order?

    Authors: Jaap Jumelet, Lisa Bylinina, Willem Zuidema, Jakub Szymanik

    Abstract: In English and other languages, multiple adjectives in a complex noun phrase show intricate ordering patterns that have been a target of much linguistic theory. These patterns offer an opportunity to assess the ability of language models (LMs) to learn subtle rules of language involving factors that cross the traditional divisions of syntax, semantics, and pragmatics. We review existing hypotheses… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  2. arXiv:2406.06441  [pdf, other

    cs.CL cs.AI

    Interpretability of Language Models via Task Spaces

    Authors: Lucas Weber, Jaap Jumelet, Elia Bruni, Dieuwke Hupkes

    Abstract: The usual way to interpret language models (LMs) is to test their performance on different benchmarks and subsequently infer their internal processes. In this paper, we present an alternative approach, concentrating on the quality of LM processing, with a focus on their language abilities. To this end, we construct 'linguistic task spaces' -- representations of an LM's language conceptualisation -… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: To be published at ACL 2024 (main)

  3. arXiv:2406.04847  [pdf, other

    cs.CL

    Do Language Models Exhibit Human-like Structural Priming Effects?

    Authors: Jaap Jumelet, Willem Zuidema, Arabella Sinclair

    Abstract: We explore which linguistic factors -- at the sentence and token level -- play an important role in influencing language model predictions, and investigate whether these are reflective of results found in humans and human corpora (Gries and Kootstra, 2017). We make use of the structural priming paradigm, where recent exposure to a structure facilitates processing of the same structure. We don't on… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: ACL Findings 2024

  4. arXiv:2405.15750  [pdf, other

    cs.CL cs.AI cs.LG

    Filtered Corpus Training (FiCT) Shows that Language Models can Generalize from Indirect Evidence

    Authors: Abhinav Patil, Jaap Jumelet, Yu Ying Chiu, Andy Lapastora, Peter Shen, Lexie Wang, Clevis Willrich, Shane Steinert-Threlkeld

    Abstract: This paper introduces Filtered Corpus Training, a method that trains language models (LMs) on corpora with certain linguistic constructions filtered out from the training data, and uses it to measure the ability of LMs to perform linguistic generalization on the basis of indirect evidence. We apply the method to both LSTM and Transformer LMs (of roughly comparable size), develo** filtered corpor… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: 10 pages + 7 pages of references/appendices. For code and trained models, see http://github.com/CLMBRs/corpus-filtering

  5. arXiv:2311.13061  [pdf, other

    cs.CL

    Attribution and Alignment: Effects of Local Context Repetition on Utterance Production and Comprehension in Dialogue

    Authors: Aron Molnar, Jaap Jumelet, Mario Giulianelli, Arabella Sinclair

    Abstract: Language models are often used as the backbone of modern dialogue systems. These models are pre-trained on large amounts of written fluent language. Repetition is typically penalised when evaluating language model generations. However, it is a key component of dialogue. Humans use local and partner specific repetitions; these are preferred by human users and lead to more successful communication i… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

    Comments: CoNLL 2023

  6. arXiv:2310.14840  [pdf, other

    cs.CL

    Transparency at the Source: Evaluating and Interpreting Language Models With Access to the True Distribution

    Authors: Jaap Jumelet, Willem Zuidema

    Abstract: We present a setup for training, evaluating and interpreting neural language models, that uses artificial, language-like data. The data is generated using a massive probabilistic grammar (based on state-split PCFGs), that is itself derived from a large natural language corpus, but also provides us complete control over the generative process. We describe and release both grammar and corpus, and te… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

    Comments: EMNLP Findings 2023

  7. arXiv:2310.11282  [pdf, other

    cs.CL

    ChapGTP, ILLC's Attempt at Raising a BabyLM: Improving Data Efficiency by Automatic Task Formation

    Authors: Jaap Jumelet, Michael Hanna, Marianne de Heer Kloots, Anna Langedijk, Charlotte Pouw, Oskar van der Wal

    Abstract: We present the submission of the ILLC at the University of Amsterdam to the BabyLM challenge (Warstadt et al., 2023), in the strict-small track. Our final model, ChapGTP, is a masked language model that was trained for 200 epochs, aided by a novel data augmentation technique called Automatic Task Formation. We discuss in detail the performance of this model on the three evaluation suites: BLiMP, (… ▽ More

    Submitted 17 October, 2023; originally announced October 2023.

    Comments: Part of the BabyLM challenge at CoNLL

  8. arXiv:2310.03686  [pdf, other

    cs.CL

    DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers

    Authors: Anna Langedijk, Hosein Mohebbi, Gabriele Sarti, Willem Zuidema, Jaap Jumelet

    Abstract: In recent years, many interpretability methods have been proposed to help interpret the internal states of Transformer-models, at different levels of precision and complexity. Here, to analyze encoder-decoder Transformers, we propose a simple, new method: DecoderLens. Inspired by the LogitLens (for decoder-only Transformers), this method involves allowing the decoder to cross-attend representation… ▽ More

    Submitted 3 April, 2024; v1 submitted 5 October, 2023; originally announced October 2023.

    Comments: Accepted to Findings of NAACL 2024

  9. arXiv:2308.12202  [pdf, other

    cs.LG cs.CL

    Curriculum Learning with Adam: The Devil Is in the Wrong Details

    Authors: Lucas Weber, Jaap Jumelet, Paul Michel, Elia Bruni, Dieuwke Hupkes

    Abstract: Curriculum learning (CL) posits that machine learning models -- similar to humans -- may learn more efficiently from data that match their current learning progress. However, CL methods are still poorly understood and, in particular for natural language processing (NLP), have achieved only limited success. In this paper, we explore why. Starting from an attempt to replicate and extend a number of… ▽ More

    Submitted 23 August, 2023; originally announced August 2023.

  10. arXiv:2306.12181  [pdf, other

    cs.CL

    Feature Interactions Reveal Linguistic Structure in Language Models

    Authors: Jaap Jumelet, Willem Zuidema

    Abstract: We study feature interactions in the context of feature attribution methods for post-hoc interpretability. In interpretability research, getting to grips with feature interactions is increasingly recognised as an important challenge, because interacting features are key to the success of neural networks. Feature interactions allow a model to build up hierarchical representations for its input, and… ▽ More

    Submitted 21 June, 2023; originally announced June 2023.

    Comments: ACL Findings 2023

  11. arXiv:2207.10245  [pdf, other

    cs.CL cs.AI

    The Birth of Bias: A case study on the evolution of gender bias in an English language model

    Authors: Oskar van der Wal, Jaap Jumelet, Katrin Schulz, Willem Zuidema

    Abstract: Detecting and mitigating harmful biases in modern language models are widely recognized as crucial, open problems. In this paper, we take a step back and investigate how language models come to be biased in the first place. We use a relatively small language model, using the LSTM architecture trained on an English Wikipedia corpus. With full access to the data and to the model parameters as they c… ▽ More

    Submitted 20 July, 2022; originally announced July 2022.

    Comments: Accepted at the 4th Workshop on Gender Bias in Natural Language Processing (NAACL, 2022)

  12. arXiv:2206.04615  [pdf, other

    cs.CL cs.AI cs.CY cs.LG stat.ML

    Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

    Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

    Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More

    Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

    Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

  13. arXiv:2109.14989  [pdf, other

    cs.CL

    Structural Persistence in Language Models: Priming as a Window into Abstract Language Representations

    Authors: Arabella Sinclair, Jaap Jumelet, Willem Zuidema, Raquel Fernández

    Abstract: We investigate the extent to which modern, neural language models are susceptible to structural priming, the phenomenon whereby the structure of a sentence makes the same structure more probable in a follow-up sentence. We explore how priming can be used to study the potential of these models to learn abstract structural information, which is a prerequisite for good performance on tasks that requi… ▽ More

    Submitted 29 June, 2022; v1 submitted 30 September, 2021; originally announced September 2021.

    Comments: Published in TACL, MIT Press

  14. arXiv:2105.13818  [pdf, other

    cs.CL

    Language Models Use Monotonicity to Assess NPI Licensing

    Authors: Jaap Jumelet, Milica Denić, Jakub Szymanik, Dieuwke Hupkes, Shane Steinert-Threlkeld

    Abstract: We investigate the semantic knowledge of language models (LMs), focusing on (1) whether these LMs create categories of linguistic environments based on their semantic monotonicity properties, and (2) whether these categories play a similar role in LMs as in human language understanding, using negative polarity item licensing as a case study. We introduce a series of experiments consisting of probi… ▽ More

    Submitted 28 May, 2021; originally announced May 2021.

    Comments: Published in ACL Findings 2021

  15. arXiv:2104.12424  [pdf, other

    cs.CL

    Attention vs non-attention for a Shapley-based explanation method

    Authors: Tom Kersten, Hugh Mee Wong, Jaap Jumelet, Dieuwke Hupkes

    Abstract: The field of explainable AI has recently seen an explosion in the number of explanation methods for highly non-linear deep neural networks. The extent to which such methods -- that are often proposed and tested in the domain of computer vision -- are appropriate to address the explainability challenges in NLP is yet relatively unexplored. In this work, we consider Contextual Decomposition (CD) --… ▽ More

    Submitted 26 April, 2021; originally announced April 2021.

    Comments: Accepted for publication at DeeLIO 2021

  16. arXiv:2101.11287  [pdf, other

    cs.CL cs.LG

    Language Modelling as a Multi-Task Problem

    Authors: Lucas Weber, Jaap Jumelet, Elia Bruni, Dieuwke Hupkes

    Abstract: In this paper, we propose to study language modelling as a multi-task problem, bringing together three strands of research: multi-task learning, linguistics, and interpretability. Based on hypotheses derived from linguistic theory, we investigate whether language models adhere to learning principles of multi-task learning during training. To showcase the idea, we analyse the generalisation behavio… ▽ More

    Submitted 27 January, 2021; originally announced January 2021.

    Comments: Accepted for publication at EACL 2021

  17. arXiv:2011.06819  [pdf, other

    cs.CL cs.LG

    diagNNose: A Library for Neural Activation Analysis

    Authors: Jaap Jumelet

    Abstract: In this paper we introduce diagNNose, an open source library for analysing the activations of deep neural networks. diagNNose contains a wide array of interpretability techniques that provide fundamental insights into the inner workings of neural networks. We demonstrate the functionality of diagNNose with a case study on subject-verb agreement within language models. diagNNose is available at htt… ▽ More

    Submitted 13 November, 2020; originally announced November 2020.

    Comments: Accepted to the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, EMNLP 2020

  18. The 2019/20 Australian wildfires generated a persistent smoke-charged vortex rising up to 35 km altitude

    Authors: Sergey Khaykin, Bernard Legras, Silvia Bucci, Pasquale Sellitto, Lars Isaksen, Florent Tence, Slimane Bekki, Adam Bourassa, Landon Rieger, Daniel Zawada, Julien Jumelet, Sophie Godin-Beekman

    Abstract: The Australian bushfires around the turn of the year 2020 generated an unprecedented perturbation of stratospheric composition, dynamical circulation and radiative balance. Here we show from satellite observations that the resulting planetary-scale blocking of solar radiation by the smoke is larger than any previously documented wildfires and of the same order as the radiative forcing produced by… ▽ More

    Submitted 24 August, 2020; v1 submitted 12 June, 2020; originally announced June 2020.

    Comments: to appear in Communications Earth & Environment

    Journal ref: Communications Earth and Environment, 2020

  19. arXiv:1909.08975  [pdf, other

    cs.CL cs.AI stat.ML

    Analysing Neural Language Models: Contextual Decomposition Reveals Default Reasoning in Number and Gender Assignment

    Authors: Jaap Jumelet, Willem Zuidema, Dieuwke Hupkes

    Abstract: Extensive research has recently shown that recurrent neural language models are able to process a wide range of grammatical phenomena. How these models are able to perform these remarkable feats so well, however, is still an open question. To gain more insight into what information LSTMs base their decisions on, we propose a generalisation of Contextual Decomposition (GCD). In particular, this set… ▽ More

    Submitted 19 September, 2019; originally announced September 2019.

    Comments: To appear at CoNLL2019

  20. arXiv:1808.10627  [pdf, other

    cs.CL

    Do Language Models Understand Anything? On the Ability of LSTMs to Understand Negative Polarity Items

    Authors: Jaap Jumelet, Dieuwke Hupkes

    Abstract: In this paper, we attempt to link the inner workings of a neural language model to linguistic theory, focusing on a complex phenomenon well discussed in formal linguis- tics: (negative) polarity items. We briefly discuss the leading hypotheses about the licensing contexts that allow negative polarity items and evaluate to what extent a neural language model has the ability to correctly process a s… ▽ More

    Submitted 31 August, 2018; originally announced August 2018.

    Comments: Accepted to the EMNLP workshop "Analyzing and interpreting neural networks for NLP"