Search | arXiv e-print repository

Heuristic Algorithms for the Approximation of Mutual Coherence

Authors: Gregor Betz, Vera Chekan, Tamara Mchedlidze

Abstract: Mutual coherence is a measure of similarity between two opinions. Although the notion comes from philosophy, it is essential for a wide range of technologies, e.g., the Wahl-O-Mat system. In Germany, this system helps voters to find candidates that are the closest to their political preferences. The exact computation of mutual coherence is highly time-consuming due to the iteration over all subset… ▽ More Mutual coherence is a measure of similarity between two opinions. Although the notion comes from philosophy, it is essential for a wide range of technologies, e.g., the Wahl-O-Mat system. In Germany, this system helps voters to find candidates that are the closest to their political preferences. The exact computation of mutual coherence is highly time-consuming due to the iteration over all subsets of an opinion. Moreover, for every subset, an instance of the SAT model counting problem has to be solved which is known to be a hard problem in computer science. This work is the first study to accelerate this computation. We model the distribution of the so-called confirmation values as a mixture of three Gaussians and present efficient heuristics to estimate its model parameters. The mutual coherence is then approximated with the expected value of the distribution. Some of the presented algorithms are fully polynomial-time, others only require solving a small number of instances of the SAT model counting problem. The average squared error of our best algorithm lies below 0.0035 which is insignificant if the efficiency is taken into account. Furthermore, the accuracy is precise enough to be used in Wahl-O-Mat-like systems. △ Less

Submitted 4 July, 2023; originally announced July 2023.

Comments: Results from 2021

arXiv:2206.04615 [pdf, other]

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 204 tasks, contributed by 450 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit "breakthrough" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting. △ Less

Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

arXiv:2110.01509 [pdf, other]

DeepA2: A Modular Framework for Deep Argument Analysis with Pretrained Neural Text2Text Language Models

Authors: Gregor Betz, Kyle Richardson

Abstract: In this paper, we present and implement a multi-dimensional, modular framework for performing deep argument analysis (DeepA2) using current pre-trained language models (PTLMs). ArgumentAnalyst -- a T5 model (Raffel et al. 2020) set up and trained within DeepA2 -- reconstructs argumentative texts, which advance an informal argumentation, as valid arguments: It inserts, e.g., missing premises and co… ▽ More In this paper, we present and implement a multi-dimensional, modular framework for performing deep argument analysis (DeepA2) using current pre-trained language models (PTLMs). ArgumentAnalyst -- a T5 model (Raffel et al. 2020) set up and trained within DeepA2 -- reconstructs argumentative texts, which advance an informal argumentation, as valid arguments: It inserts, e.g., missing premises and conclusions, formalizes inferences, and coherently links the logical reconstruction to the source text. We create a synthetic corpus for deep argument analysis, and evaluate ArgumentAnalyst on this new dataset as well as on existing data, specifically EntailmentBank (Dalvi et al. 2021). Our empirical findings vindicate the overall framework and highlight the advantages of a modular design, in particular its ability to emulate established heuristics (such as hermeneutic cycles), to explore the model's uncertainty, to cope with the plurality of correct solutions (underdetermination), and to exploit higher-order evidence. △ Less

Submitted 1 July, 2022; v1 submitted 4 October, 2021; originally announced October 2021.

Comments: A Demo is available at https://huggingface.co/spaces/debatelab/deepa2-demo , the model can be downloaded from https://huggingface.co/debatelab/argument-analyst , and the datasets can be accessed at https://huggingface.co/datasets/debatelab/aaac

Journal ref: *SEM 2022

arXiv:2104.06737 [pdf, other]

doi 10.18564/jasss.4725

Natural-Language Multi-Agent Simulations of Argumentative Opinion Dynamics

Authors: Gregor Betz

Abstract: This paper develops a natural-language agent-based model of argumentation (ABMA). Its artificial deliberative agents (ADAs) are constructed with the help of so-called neural language models recently developed in AI and computational linguistics. ADAs are equipped with a minimalist belief system and may generate and submit novel contributions to a conversation. The natural-language ABMA allows us t… ▽ More This paper develops a natural-language agent-based model of argumentation (ABMA). Its artificial deliberative agents (ADAs) are constructed with the help of so-called neural language models recently developed in AI and computational linguistics. ADAs are equipped with a minimalist belief system and may generate and submit novel contributions to a conversation. The natural-language ABMA allows us to simulate collective deliberation in English, i.e. with arguments, reasons, and claims themselves -- rather than with their mathematical representations (as in formal models). This paper uses the natural-language ABMA to test the robustness of formal reason-balancing models of argumentation [Maes & Flache 2013, Singer et al. 2019]: First of all, as long as ADAs remain passive, confirmation bias and homophily updating trigger polarization, which is consistent with results from formal models. However, once ADAs start to actively generate new contributions, the evolution of a conservation is dominated by properties of the agents *as authors*. This suggests that the creation of new arguments, reasons, and claims critically affects a conversation and is of pivotal importance for understanding the dynamics of collective deliberation. The paper closes by pointing out further fruitful applications of the model and challenges for future research. △ Less

Submitted 14 April, 2021; originally announced April 2021.

arXiv:2103.13033 [pdf, other]

Thinking Aloud: Dynamic Context Generation Improves Zero-Shot Reasoning Performance of GPT-2

Authors: Gregor Betz, Kyle Richardson, Christian Voigt

Abstract: Thinking aloud is an effective meta-cognitive strategy human reasoners apply to solve difficult problems. We suggest to improve the reasoning ability of pre-trained neural language models in a similar way, namely by expanding a task's context with problem elaborations that are dynamically generated by the language model itself. Our main result is that dynamic problem elaboration significantly impr… ▽ More Thinking aloud is an effective meta-cognitive strategy human reasoners apply to solve difficult problems. We suggest to improve the reasoning ability of pre-trained neural language models in a similar way, namely by expanding a task's context with problem elaborations that are dynamically generated by the language model itself. Our main result is that dynamic problem elaboration significantly improves the zero-shot performance of GPT-2 in a deductive reasoning and natural language inference task: While the model uses a syntactic heuristic for predicting an answer, it is capable (to some degree) of generating reasoned additional context which facilitates the successful application of its heuristic. We explore different ways of generating elaborations, including fewshot learning, and find that their relative performance varies with the specific problem characteristics (such as problem difficulty). Moreover, the effectiveness of an elaboration can be explained in terms of the degree to which the elaboration semantically coheres with the corresponding problem. In particular, elaborations that are most faithful to the original problem description may boost accuracy by up to 24%. △ Less

Submitted 24 March, 2021; originally announced March 2021.

arXiv:2009.07185 [pdf, other]

Critical Thinking for Language Models

Authors: Gregor Betz, Christian Voigt, Kyle Richardson

Abstract: This paper takes a first step towards a critical thinking curriculum for neural auto-regressive language models. We introduce a synthetic corpus of deductively valid arguments, and generate artificial argumentative texts to train and evaluate GPT-2. Significant transfer learning effects can be observed: Training a model on three simple core schemes allows it to accurately complete conclusions of d… ▽ More This paper takes a first step towards a critical thinking curriculum for neural auto-regressive language models. We introduce a synthetic corpus of deductively valid arguments, and generate artificial argumentative texts to train and evaluate GPT-2. Significant transfer learning effects can be observed: Training a model on three simple core schemes allows it to accurately complete conclusions of different, and more complex types of arguments, too. The language models generalize the core argument schemes in a correct way. Moreover, we obtain consistent and promising results for NLU benchmarks. In particular, pre-training on the argument schemes raises zero-shot accuracy on the GLUE diagnostics by up to 15 percentage points. The findings suggest that intermediary pre-training on texts that exemplify basic reasoning abilities (such as typically covered in critical thinking textbooks) might help language models to acquire a broad range of reasoning skills. The synthetic argumentative texts presented in this paper are a promising starting point for building such a "critical thinking curriculum for language models." △ Less

Submitted 17 December, 2020; v1 submitted 15 September, 2020; originally announced September 2020.

arXiv:1704.04384 [pdf, other]

doi 10.1021/acsami.7b01237

Thermostat Influence on the Structural Development and Material Removal during Abrasion of Nanocrystalline Ferrite

Authors: Stefan J. Eder, Ulrike Cihak-Bayr, Davide Bianchi, Gregor Feldbauer, Gerhard Betz

Abstract: We consider a nanomachining process of hard, abrasive particles grinding on the rough surface of a polycrystalline ferritic work piece. Using extensive large-scale molecular dynamics (MD) simulations, we show that the mode of thermostatting, i.e., the way that the heat generated through deformation and friction is removed from the system, has crucial impact on tribological and materials related ph… ▽ More We consider a nanomachining process of hard, abrasive particles grinding on the rough surface of a polycrystalline ferritic work piece. Using extensive large-scale molecular dynamics (MD) simulations, we show that the mode of thermostatting, i.e., the way that the heat generated through deformation and friction is removed from the system, has crucial impact on tribological and materials related phenomena. By adopting an electron-phonon coupling approach to parametrize the thermostat of the system, thus including the electronic contribution to the thermal conductivity of iron, we can reproduce the experimentally measured values that yield realistic temperature gradients in the work piece. We compare these results to those obtained by assuming the two extreme cases of only phononic heat conduction and instantaneous removal of the heat generated in the machining interface. Our discussion of the differences between these three cases reveals that although the average shear stress is virtually temperature independent up to a normal pressure of approximately 1 GPa, the grain and chip morphology as well as most relevant quantities depend heavily on the mode of thermostatting beyond a normal pressure of 0.4 GPa. These pronounced differences can be explained by the thermally activated processes that guide the reaction of the Fe lattice to the external mechanical and thermal loads caused by nanomachining. △ Less

Submitted 14 April, 2017; originally announced April 2017.

Journal ref: ACS Applied Materials & Interfaces 9 (15), 13713-13725, 2017

arXiv:1209.5526 [pdf, other]

doi 10.1016/j.nimb.2013.01.046

Modelling surface restructuring by slow highly charged ions

Authors: G. Wachter, K. Tökési, G. Betz, C. Lemell, J. Burgdörfer

Abstract: We theoretically investigate surface modifications on alkaline earth halides due to highly charged ion impact, focusing on recent experimental evidence for both etch pit and nano-hillock formation on CaF2 [A. El-Said et al, PRL 109, 117602 (2012)]. We discuss mechanisms for converting the projectile potential and kinetic energies into thermal energy capable of changing the surface structure. A pro… ▽ More We theoretically investigate surface modifications on alkaline earth halides due to highly charged ion impact, focusing on recent experimental evidence for both etch pit and nano-hillock formation on CaF2 [A. El-Said et al, PRL 109, 117602 (2012)]. We discuss mechanisms for converting the projectile potential and kinetic energies into thermal energy capable of changing the surface structure. A proof-of-principle classical molecular dynamics simulation suggests the existence of two thresholds which we associate with etch pit and nano-hillock formation in qualitative agreement with experiment. △ Less

Submitted 15 January, 2013; v1 submitted 25 September, 2012; originally announced September 2012.

Comments: Accepted in proceedings of IISC-2012 (NIM B)

Showing 1–8 of 8 results for author: Betz, G