Skip to main content

Showing 1–2 of 2 results for author: Nyist, M K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.03555  [pdf, other

    cs.CL

    From News to Summaries: Building a Hungarian Corpus for Extractive and Abstractive Summarization

    Authors: Botond Barta, Dorina Lakatos, Attila Nagy, Milán Konor Nyist, Judit Ács

    Abstract: Training summarization models requires substantial amounts of training data. However for less resourceful languages like Hungarian, openly available models and datasets are notably scarce. To address this gap our paper introduces HunSum-2 an open-source Hungarian corpus suitable for training abstractive and extractive summarization models. The dataset is assembled from segments of the Common Crawl… ▽ More

    Submitted 12 April, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

  2. arXiv:2302.00455  [pdf, other

    cs.CL

    HunSum-1: an Abstractive Summarization Dataset for Hungarian

    Authors: Botond Barta, Dorina Lakatos, Attila Nagy, Milán Konor Nyist, Judit Ács

    Abstract: We introduce HunSum-1: a dataset for Hungarian abstractive summarization, consisting of 1.14M news articles. The dataset is built by collecting, cleaning and deduplicating data from 9 major Hungarian news sites through CommonCrawl. Using this dataset, we build abstractive summarizer models based on huBERT and mT5. We demonstrate the value of the created dataset by performing a quantitative and qua… ▽ More

    Submitted 1 February, 2023; originally announced February 2023.