Skip to main content

Showing 1–11 of 11 results for author: Kornai, A

Searching in archive cs. Search in all archives.
.
  1. Morphosyntactic probing of multilingual BERT models

    Authors: Judit Acs, Endre Hamerlik, Roy Schwartz, Noah A. Smith, Andras Kornai

    Abstract: We introduce an extensive dataset for multilingual probing of morphological information in language models (247 tasks across 42 languages from 10 families), each consisting of a sentence with a target word and a morphological tag as the desired label, derived from the Universal Dependencies treebanks. We find that pre-trained Transformer models (mBERT and XLM-RoBERTa) learn features that attain st… ▽ More

    Submitted 9 June, 2023; originally announced June 2023.

    Comments: to appear in the Journal of Natural Language Engineering

  2. arXiv:2303.00752  [pdf, other

    cs.AI

    Safety without alignment

    Authors: András Kornai, Michael Bukatin, Zsolt Zombori

    Abstract: Currently, the dominant paradigm in AI safety is alignment with human values. Here we describe progress on develo** an alternative approach to safety, based on ethical rationalism (Gewirth:1978), and propose an inherently safe implementation path via hybrid theorem provers in a sandbox. As AGIs evolve, their alignment may fade, but their rationality can only increase (otherwise more rational one… ▽ More

    Submitted 18 March, 2023; v1 submitted 27 February, 2023; originally announced March 2023.

  3. arXiv:2109.06327  [pdf, other

    cs.CL

    Evaluating Transferability of BERT Models on Uralic Languages

    Authors: Judit Ács, Dániel Lévai, András Kornai

    Abstract: Transformer-based language models such as BERT have outperformed previous models on a large number of English benchmarks, but their evaluation is often limited to English or a small number of well-resourced languages. In this work, we evaluate monolingual, multilingual, and randomly initialized language models from the BERT family on a variety of Uralic languages including Estonian, Finnish, Hunga… ▽ More

    Submitted 23 November, 2021; v1 submitted 13 September, 2021; originally announced September 2021.

    Comments: Seventh International Workshop for Computational Linguistics of Uralic Languages (IWCLUL 2021)

  4. arXiv:2102.10864  [pdf, other

    cs.CL

    Subword Pooling Makes a Difference

    Authors: Judit Ács, Ákos Kádár, András Kornai

    Abstract: Contextual word-representations became a standard in modern natural language processing systems. These models use subword tokenization to handle large vocabularies and unknown words. Word-level usage of such systems requires a way of pooling multiple subwords that correspond to a single word. In this paper we investigate how the choice of subword pooling affects the downstream performance on three… ▽ More

    Submitted 29 March, 2021; v1 submitted 22 February, 2021; originally announced February 2021.

    Journal ref: EACL2021

  5. arXiv:2102.10848  [pdf, other

    cs.CL

    Evaluating Contextualized Language Models for Hungarian

    Authors: Judit Ács, Dániel Lévai, Dávid Márk Nemeskey, András Kornai

    Abstract: We present an extended comparison of contextualized language models for Hungarian. We compare huBERT, a Hungarian model against 4 multilingual models including the multilingual BERT model. We evaluate these models through three tasks, morphological probing, POS tagging and NER. We find that huBERT works better than the other models, often by a large margin, particularly near the global optimum (ty… ▽ More

    Submitted 22 February, 2021; originally announced February 2021.

    Journal ref: Hungarian NLP Conference (MSZNY2021)

  6. arXiv:2012.04575  [pdf, other

    cs.CL

    The Role of Interpretable Patterns in Deep Learning for Morphology

    Authors: Judit Acs, Andras Kornai

    Abstract: We examine the role of character patterns in three tasks: morphological analysis, lemmatization and copy. We use a modified version of the standard sequence-to-sequence model, where the encoder is a pattern matching network. Each pattern scores all possible N character long subwords (substrings) on the source side, and the highest scoring subword's score is used to initialize the decoder as well a… ▽ More

    Submitted 8 December, 2020; originally announced December 2020.

    Comments: Best paper at the Hungarian NLP conference (MSZNY2020)

    Journal ref: XVI. Magyar Számítógépes Nyelvészeti Konferencia, 2020, page 171-179 (MSZNY2020)

  7. arXiv:1905.10924  [pdf, other

    math.HO cs.AI math.PR

    Naive probability

    Authors: Zalan Gyenis, Andras Kornai

    Abstract: We describe a rational, but low resolution model of probability.

    Submitted 13 December, 2021; v1 submitted 20 May, 2019; originally announced May 2019.

    Comments: 8 pages

    ACM Class: I.2.3; I.2.4

  8. arXiv:1905.09139  [pdf, other

    cs.CL

    Sentence Length

    Authors: Gábor Borbély, András Kornai

    Abstract: The distribution of sentence length in ordinary language is not well captured by the existing models. Here we survey previous models of sentence length and present our random walk model that offers both a better fit with the data and a better understanding of the distribution. We develop a generalization of KL divergence, discuss measuring the noise inherent in a corpus, and present a hyperparamet… ▽ More

    Submitted 22 May, 2019; originally announced May 2019.

  9. arXiv:1204.2765  [pdf, other

    cs.CL physics.data-an physics.soc-ph

    A practical approach to language complexity: a Wikipedia case study

    Authors: Taha Yasseri, András Kornai, János Kertész

    Abstract: In this paper we present statistical analysis of English texts from Wikipedia. We try to address the issue of language complexity empirically by comparing the simple English Wikipedia (Simple) to comparable samples of the main English Wikipedia (Main). Simple is supposed to use a more simplified language with a limited vocabulary, and editors are explicitly requested to follow this guideline, yet… ▽ More

    Submitted 18 August, 2012; v1 submitted 12 April, 2012; originally announced April 2012.

    Comments: 2 new figures, 1 new section, and 2 new supporting texts

    Journal ref: PLoS ONE 7(11): e48386 (2012)

  10. arXiv:1202.3643  [pdf, other

    physics.soc-ph cs.SI physics.data-an

    Dynamics of conflicts in Wikipedia

    Authors: Taha Yasseri, Robert Sumi, András Rung, András Kornai, János Kertész

    Abstract: In this work we study the dynamical features of editorial wars in Wikipedia (WP). Based on our previously established algorithm, we build up samples of controversial and peaceful articles and analyze the temporal characteristics of the activity in these samples. On short time scales, we show that there is a clear correspondence between conflict and burstiness of activity patterns, and that memory… ▽ More

    Submitted 2 May, 2012; v1 submitted 16 February, 2012; originally announced February 2012.

    Comments: Supporting information added

    Journal ref: PLoS ONE 7(6): e38869 (2012)

  11. arXiv:1107.3689  [pdf, other

    stat.ML cs.DL physics.data-an physics.soc-ph

    Edit wars in Wikipedia

    Authors: Róbert Sumi, Taha Yasseri, András Rung, András Kornai, János Kertész

    Abstract: We present a new, efficient method for automatically detecting severe conflicts `edit wars' in Wikipedia and evaluate this method on six different language WPs. We discuss how the number of edits, reverts, the length of discussions, the burstiness of edits and reverts deviate in such pages from those following the general workflow, and argue that earlier work has significantly over-estimated the c… ▽ More

    Submitted 9 February, 2012; v1 submitted 19 July, 2011; originally announced July 2011.

    Comments: 4 pages, 2 figures, 3 tables. The current version is shortened to be published in SocialCom 2011

    Journal ref: IEEE Third International Conference on Social Computing (SocialCom), 9-11 Oct. 2011, 724-727, Boston, MA, USA