-
Encoder vs Decoder: Comparative Analysis of Encoder and Decoder Language Models on Multilingual NLU Tasks
Authors:
Dan Saattrup Nielsen,
Kenneth Enevoldsen,
Peter Schneider-Kamp
Abstract:
This paper explores the performance of encoder and decoder language models on multilingual Natural Language Understanding (NLU) tasks, with a broad focus on Germanic languages. Building upon the ScandEval benchmark, which initially was restricted to evaluating encoder models, we extend the evaluation framework to include decoder models. We introduce a method for evaluating decoder models on NLU ta…
▽ More
This paper explores the performance of encoder and decoder language models on multilingual Natural Language Understanding (NLU) tasks, with a broad focus on Germanic languages. Building upon the ScandEval benchmark, which initially was restricted to evaluating encoder models, we extend the evaluation framework to include decoder models. We introduce a method for evaluating decoder models on NLU tasks and apply it to the languages Danish, Swedish, Norwegian, Icelandic, Faroese, German, Dutch, and English. Through a series of experiments and analyses, we address key research questions regarding the comparative performance of encoder and decoder models, the impact of NLU task types, and the variation across language resources. Our findings reveal that decoder models can achieve significantly better NLU performance than encoder models, with nuances observed across different tasks and languages. Additionally, we investigate the correlation between decoders and task performance via a UMAP analysis, shedding light on the unique capabilities of decoder and encoder models. This study contributes to a deeper understanding of language model paradigms in NLU tasks and provides valuable insights for model selection and evaluation in multilingual settings.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Model Agnostic Explainable Selective Regression via Uncertainty Estimation
Authors:
Andrea Pugnana,
Carlos Mougan,
Dan Saattrup Nielsen
Abstract:
With the wide adoption of machine learning techniques, requirements have evolved beyond sheer high performance, often requiring models to be trustworthy. A common approach to increase the trustworthiness of such systems is to allow them to refrain from predicting. Such a framework is known as selective prediction. While selective prediction for classification tasks has been widely analyzed, the pr…
▽ More
With the wide adoption of machine learning techniques, requirements have evolved beyond sheer high performance, often requiring models to be trustworthy. A common approach to increase the trustworthiness of such systems is to allow them to refrain from predicting. Such a framework is known as selective prediction. While selective prediction for classification tasks has been widely analyzed, the problem of selective regression is understudied. This paper presents a novel approach to selective regression that utilizes model-agnostic non-parametric uncertainty estimation. Our proposed framework showcases superior performance compared to state-of-the-art selective regressors, as demonstrated through comprehensive benchmarking on 69 datasets. Finally, we use explainable AI techniques to gain an understanding of the drivers behind selective regression. We implement our selective regression method in the open-source Python package doubt and release the code used to reproduce our experiments.
△ Less
Submitted 15 November, 2023;
originally announced November 2023.
-
Danish Foundation Models
Authors:
Kenneth Enevoldsen,
Lasse Hansen,
Dan S. Nielsen,
Rasmus A. F. Egebæk,
Søren V. Holm,
Martin C. Nielsen,
Martin Bernstorff,
Rasmus Larsen,
Peter B. Jørgensen,
Malte Højmark-Bertelsen,
Peter B. Vahlstrup,
Per Møldrup-Dalum,
Kristoffer Nielbo
Abstract:
Large language models, sometimes referred to as foundation models, have transformed multiple fields of research. However, smaller languages risk falling behind due to high training costs and small incentives for large companies to train these models. To combat this, the Danish Foundation Models project seeks to provide and maintain open, well-documented, and high-quality foundation models for the…
▽ More
Large language models, sometimes referred to as foundation models, have transformed multiple fields of research. However, smaller languages risk falling behind due to high training costs and small incentives for large companies to train these models. To combat this, the Danish Foundation Models project seeks to provide and maintain open, well-documented, and high-quality foundation models for the Danish language. This is achieved through broad cooperation with public and private institutions, to ensure high data quality and applicability of the trained models. We present the motivation of the project, the current status, and future perspectives.
△ Less
Submitted 13 November, 2023;
originally announced November 2023.
-
ScandEval: A Benchmark for Scandinavian Natural Language Processing
Authors:
Dan Saattrup Nielsen
Abstract:
This paper introduces a Scandinavian benchmarking platform, ScandEval, which can benchmark any pretrained model on four different tasks in the Scandinavian languages. The datasets used in two of the tasks, linguistic acceptability and question answering, are new. We develop and release a Python package and command-line interface, scandeval, which can benchmark any model that has been uploaded to t…
▽ More
This paper introduces a Scandinavian benchmarking platform, ScandEval, which can benchmark any pretrained model on four different tasks in the Scandinavian languages. The datasets used in two of the tasks, linguistic acceptability and question answering, are new. We develop and release a Python package and command-line interface, scandeval, which can benchmark any model that has been uploaded to the Hugging Face Hub, with reproducible results. Using this package, we benchmark more than 100 Scandinavian or multilingual models and present the results of these in an interactive online leaderboard, as well as provide an analysis of the results. The analysis shows that there is substantial cross-lingual transfer among the Mainland Scandinavian languages (Danish, Swedish and Norwegian), with limited cross-lingual transfer between the group of Mainland Scandinavian languages and the group of Insular Scandinavian languages (Icelandic and Faroese). The benchmarking results also show that the investment in language technology in Norway, Sweden and Denmark has led to language models that outperform massively multilingual models such as XLM-RoBERTa and mDeBERTaV3. We release the source code for both the package and leaderboard.
△ Less
Submitted 3 April, 2023;
originally announced April 2023.
-
Addressing contingency in algorithmic (mis)information classification: Toward a responsible machine learning agenda
Authors:
Andrés Domínguez Hernández,
Richard Owen,
Dan Saattrup Nielsen,
Ryan McConville
Abstract:
Machine learning (ML) enabled classification models are becoming increasingly popular for tackling the sheer volume and speed of online misinformation and other content that could be identified as harmful. In building these models, data scientists need to take a stance on the legitimacy, authoritativeness and objectivity of the sources of ``truth" used for model training and testing. This has poli…
▽ More
Machine learning (ML) enabled classification models are becoming increasingly popular for tackling the sheer volume and speed of online misinformation and other content that could be identified as harmful. In building these models, data scientists need to take a stance on the legitimacy, authoritativeness and objectivity of the sources of ``truth" used for model training and testing. This has political, ethical and epistemic implications which are rarely addressed in technical papers. Despite (and due to) their reported high accuracy and performance, ML-driven moderation systems have the potential to shape online public debate and create downstream negative impacts such as undue censorship and the reinforcing of false beliefs. Using collaborative ethnography and theoretical insights from social studies of science and expertise, we offer a critical analysis of the process of building ML models for (mis)information classification: we identify a series of algorithmic contingencies--key moments during model development that could lead to different future outcomes, uncertainty and harmful effects as these tools are deployed by social media platforms. We conclude by offering a tentative path toward reflexive and responsible development of ML tools for moderating misinformation and other harmful content online.
△ Less
Submitted 13 April, 2023; v1 submitted 5 October, 2022;
originally announced October 2022.
-
MuMiN: A Large-Scale Multilingual Multimodal Fact-Checked Misinformation Social Network Dataset
Authors:
Dan Saattrup Nielsen,
Ryan McConville
Abstract:
Misinformation is becoming increasingly prevalent on social media and in news articles. It has become so widespread that we require algorithmic assistance utilising machine learning to detect such content. Training these machine learning models require datasets of sufficient scale, diversity and quality. However, datasets in the field of automatic misinformation detection are predominantly monolin…
▽ More
Misinformation is becoming increasingly prevalent on social media and in news articles. It has become so widespread that we require algorithmic assistance utilising machine learning to detect such content. Training these machine learning models require datasets of sufficient scale, diversity and quality. However, datasets in the field of automatic misinformation detection are predominantly monolingual, include a limited amount of modalities and are not of sufficient scale and quality. Addressing this, we develop a data collection and linking system (MuMiN-trawl), to build a public misinformation graph dataset (MuMiN), containing rich social media data (tweets, replies, users, images, articles, hashtags) spanning 21 million tweets belonging to 26 thousand Twitter threads, each of which have been semantically linked to 13 thousand fact-checked claims across dozens of topics, events and domains, in 41 different languages, spanning more than a decade. The dataset is made available as a heterogeneous graph via a Python package (mumin). We provide baseline results for two node classification tasks related to the veracity of a claim involving social media, and demonstrate that these are challenging tasks, with the highest macro-average F1-score being 62.55% and 61.45% for the two tasks, respectively. The MuMiN ecosystem is available at https://mumin-dataset.github.io/, including the data, documentation, tutorials and leaderboards.
△ Less
Submitted 8 March, 2022; v1 submitted 23 February, 2022;
originally announced February 2022.
-
Monitoring Model Deterioration with Explainable Uncertainty Estimation via Non-parametric Bootstrap
Authors:
Carlos Mougan,
Dan Saattrup Nielsen
Abstract:
Monitoring machine learning models once they are deployed is challenging. It is even more challenging to decide when to retrain models in real-case scenarios when labeled data is beyond reach, and monitoring performance metrics becomes unfeasible. In this work, we use non-parametric bootstrapped uncertainty estimates and SHAP values to provide explainable uncertainty estimation as a technique that…
▽ More
Monitoring machine learning models once they are deployed is challenging. It is even more challenging to decide when to retrain models in real-case scenarios when labeled data is beyond reach, and monitoring performance metrics becomes unfeasible. In this work, we use non-parametric bootstrapped uncertainty estimates and SHAP values to provide explainable uncertainty estimation as a technique that aims to monitor the deterioration of machine learning models in deployment environments, as well as determine the source of model deterioration when target labels are not available. Classical methods are purely aimed at detecting distribution shift, which can lead to false positives in the sense that the model has not deteriorated despite a shift in the data distribution. To estimate model uncertainty we construct prediction intervals using a novel bootstrap method, which improves upon the work of Kumar & Srivastava (2012). We show that both our model deterioration detection system as well as our uncertainty estimation method achieve better performance than the current state-of-the-art. Finally, we use explainable AI techniques to gain an understanding of the drivers of model deterioration. We release an open source Python package, doubt, which implements our proposed methods, as well as the code used to reproduce our experiments.
△ Less
Submitted 22 November, 2022; v1 submitted 27 January, 2022;
originally announced January 2022.
-
The Virtual Large Cardinal Hierarchy
Authors:
Stamatis Dimopoulos,
Victoria Gitman,
Dan Saattrup Nielsen
Abstract:
We continue the study of the virtual large cardinal hierarchy by analysing virtual versions of superstrong, Woodin, and Berkeley cardinals. Gitman and Schindler showed that virtualizations of strong and supercompact cardinals yield the same large cardinal notion. We provide various equivalent characterizations of virtually Woodin cardinals, including showing that On is virtually Woodin if and only…
▽ More
We continue the study of the virtual large cardinal hierarchy by analysing virtual versions of superstrong, Woodin, and Berkeley cardinals. Gitman and Schindler showed that virtualizations of strong and supercompact cardinals yield the same large cardinal notion. We provide various equivalent characterizations of virtually Woodin cardinals, including showing that On is virtually Woodin if and only if for every class A, there is a proper class of virtually A-extendible cardinals. We introduce the virtual Vopenka principle for finite languages and show that it is not equivalent to the virtual Vopenka principle (although the two principles are equiconsistent), but is equivalent to the assertion that On is virtually pre-Woodin, a weakening of virtually Woodin, which is equivalent to having for every class A, a weakly virtually A-extendible cardinal. We show that if there are no virtually Berkeley cardinals, then On is virtually Woodin if and only if On is virtually pre-Woodin (if and only if the virtual Vopenka principle for finite languages holds). In particular, if the virtual Vopenka principle holds and On is not Mahlo, then On is not virtually Woodin, and hence there is a virtually Berkeley cardinal.
△ Less
Submitted 6 May, 2023; v1 submitted 13 September, 2021;
originally announced September 2021.
-
Games and Ramsey-like cardinals
Authors:
Dan Saattrup Nielsen,
Philip Welch
Abstract:
We generalise the $α$-Ramsey cardinals introduced in Holy and Schlicht (2018) for cardinals $α$ to arbitrary ordinals $α$, and answer several questions posed in that paper. In particular, we show that $α$-Ramseys are downwards absolute to the core model $K$ for all $α$ of uncountable cofinality, that strategic $ω$-Ramsey cardinals are equiconsistent with remarkable cardinals and that strategic…
▽ More
We generalise the $α$-Ramsey cardinals introduced in Holy and Schlicht (2018) for cardinals $α$ to arbitrary ordinals $α$, and answer several questions posed in that paper. In particular, we show that $α$-Ramseys are downwards absolute to the core model $K$ for all $α$ of uncountable cofinality, that strategic $ω$-Ramsey cardinals are equiconsistent with remarkable cardinals and that strategic $α$-Ramsey cardinals are equiconsistent with measurable cardinals for all $α>ω$. We also show that the $n$-Ramseys satisfy indescribability properties and use them to provide a game-theoretic characterisation of completely ineffable cardinals, as well as establishing further connections between the $α$-Ramsey cardinals and the Ramsey-like cardinals introduced in Gitman (2011), Feng (1990) and Sharpe and Welch (2011).
△ Less
Submitted 30 October, 2018; v1 submitted 27 April, 2018;
originally announced April 2018.
-
Hot dense capsule implosion cores produced by z-pinch dynamic hohlraum radiation
Authors:
J. E. Bailey,
G. A. Chandler,
S. A. Slutz,
I. Golovkin,
P. W. Lake,
J. J. MacFarlane,
R. C. Mancini,
T. J. Buris-Mog,
G. Cooper,
R. J. Leeper,
T. A. Mehlhorn,
T. C. Moore,
T. J. Nash,
D. S. Nielsen,
C. L. Ruiz,
D. G. Schroen,
W. A. Varnum
Abstract:
Hot dense capsule implosions driven by z-pinch x-rays have been measured for the first time. A ~220 eV dynamic hohlraum imploded 1.7-2.1 mm diameter gas-filled CH capsules which absorbed up to ~20 kJ of x-rays. Argon tracer atom spectra were used to measure the Te~ 1keV electron temperature and the ne ~ 1-4 x10^23 cm-3 electron density. Spectra from multiple directions provide core symmetry esti…
▽ More
Hot dense capsule implosions driven by z-pinch x-rays have been measured for the first time. A ~220 eV dynamic hohlraum imploded 1.7-2.1 mm diameter gas-filled CH capsules which absorbed up to ~20 kJ of x-rays. Argon tracer atom spectra were used to measure the Te~ 1keV electron temperature and the ne ~ 1-4 x10^23 cm-3 electron density. Spectra from multiple directions provide core symmetry estimates. Computer simulations agree well with the peak compression values of Te, ne, and symmetry, indicating reasonable understanding of the hohlraum and implosion physics.
△ Less
Submitted 4 June, 2003;
originally announced June 2003.