-
Evaluating Contextualized Language Models for Hungarian
Abstract: We present an extended comparison of contextualized language models for Hungarian. We compare huBERT, a Hungarian model against 4 multilingual models including the multilingual BERT model. We evaluate these models through three tasks, morphological probing, POS tagging and NER. We find that huBERT works better than the other models, often by a large margin, particularly near the global optimum (ty… ▽ More
Submitted 22 February, 2021; originally announced February 2021.
Journal ref: Hungarian NLP Conference (MSZNY2021)
-
emLam -- a Hungarian Language Modeling baseline
Abstract: This paper aims to make up for the lack of documented baselines for Hungarian language modeling. Various approaches are evaluated on three publicly available Hungarian corpora. Perplexity values comparable to models of similar-sized English corpora are reported. A new, freely downloadable Hungar- ian benchmark corpus is introduced.
Submitted 26 January, 2017; originally announced January 2017.
Comments: Additional resources: - the emLam repository: https://github.com/DavidNemeskey/emLam - the emLam corpus: http://hlt.bme.hu/en/resources/emLam
ACM Class: I.2.7
Journal ref: In Proceedings of the 13th Conference on Hungarian Computational Linguistics (MSZNY), pp. 91-102. Szeged, 2017