Skip to main content

Showing 1–2 of 2 results for author: Nemeskey, D M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2102.10848  [pdf, other

    cs.CL

    Evaluating Contextualized Language Models for Hungarian

    Authors: Judit Ács, Dániel Lévai, Dávid Márk Nemeskey, András Kornai

    Abstract: We present an extended comparison of contextualized language models for Hungarian. We compare huBERT, a Hungarian model against 4 multilingual models including the multilingual BERT model. We evaluate these models through three tasks, morphological probing, POS tagging and NER. We find that huBERT works better than the other models, often by a large margin, particularly near the global optimum (ty… ▽ More

    Submitted 22 February, 2021; originally announced February 2021.

    Journal ref: Hungarian NLP Conference (MSZNY2021)

  2. arXiv:1701.07880  [pdf

    cs.CL

    emLam -- a Hungarian Language Modeling baseline

    Authors: Dávid Márk Nemeskey

    Abstract: This paper aims to make up for the lack of documented baselines for Hungarian language modeling. Various approaches are evaluated on three publicly available Hungarian corpora. Perplexity values comparable to models of similar-sized English corpora are reported. A new, freely downloadable Hungar- ian benchmark corpus is introduced.

    Submitted 26 January, 2017; originally announced January 2017.

    Comments: Additional resources: - the emLam repository: https://github.com/DavidNemeskey/emLam - the emLam corpus: http://hlt.bme.hu/en/resources/emLam

    ACM Class: I.2.7

    Journal ref: In Proceedings of the 13th Conference on Hungarian Computational Linguistics (MSZNY), pp. 91-102. Szeged, 2017