Skip to main content

Showing 1–9 of 9 results for author: Macko, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.12549  [pdf, other

    cs.CL cs.AI

    MultiSocial: Multilingual Benchmark of Machine-Generated Text Detection of Social-Media Texts

    Authors: Dominik Macko, Jakub Kopal, Robert Moro, Ivan Srba

    Abstract: Recent LLMs are able to generate high-quality multilingual texts, indistinguishable for humans from authentic human-written ones. Research in machine-generated text detection is however mostly focused on the English language and longer texts, such as news articles, scientific papers or student essays. Social-media texts are usually much shorter and often feature informal language, grammatical erro… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  2. arXiv:2402.13671  [pdf, other

    cs.CL cs.AI

    KInIT at SemEval-2024 Task 8: Fine-tuned LLMs for Multilingual Machine-Generated Text Detection

    Authors: Michal Spiegel, Dominik Macko

    Abstract: SemEval-2024 Task 8 is focused on multigenerator, multidomain, and multilingual black-box machine-generated text detection. Such a detection is important for preventing a potential misuse of large language models (LLMs), the newest of which are very capable in generating multilingual human-like texts. We have coped with this task in multiple ways, utilizing language identification and parameter-ef… ▽ More

    Submitted 17 June, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

    Comments: SemEval-2024 Task 8

  3. arXiv:2401.07867  [pdf, other

    cs.CL

    Authorship Obfuscation in Multilingual Machine-Generated Text Detection

    Authors: Dominik Macko, Robert Moro, Adaku Uchendu, Ivan Srba, Jason Samuel Lucas, Michiharu Yamashita, Nafis Irtiza Tripto, Dongwon Lee, Jakub Simko, Maria Bielikova

    Abstract: High-quality text generation capability of recent Large Language Models (LLMs) causes concerns about their misuse (e.g., in massive generation/spread of disinformation). Machine-generated text (MGT) detection is important to cope with such threats. However, it is susceptible to authorship obfuscation (AO) methods, such as paraphrasing, which can cause MGTs to evade detection. So far, this was eval… ▽ More

    Submitted 18 June, 2024; v1 submitted 15 January, 2024; originally announced January 2024.

  4. arXiv:2311.12574  [pdf, other

    cs.CL cs.AI

    IMGTB: A Framework for Machine-Generated Text Detection Benchmarking

    Authors: Michal Spiegel, Dominik Macko

    Abstract: In the era of large language models generating high quality texts, it is a necessity to develop methods for detection of machine-generated text to avoid harmful use or simply due to annotation purposes. It is, however, also important to properly evaluate and compare such developed methods. Recently, a few benchmarks have been proposed for this purpose; however, integration of newest detection meth… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

  5. arXiv:2311.08838  [pdf, other

    cs.CL

    Disinformation Capabilities of Large Language Models

    Authors: Ivan Vykopal, Matúš Pikuliak, Ivan Srba, Robert Moro, Dominik Macko, Maria Bielikova

    Abstract: Automated disinformation generation is often listed as an important risk associated with large language models (LLMs). The theoretical ability to flood the information space with disinformation content might have dramatic consequences for societies around the world. This paper presents a comprehensive study of the disinformation capabilities of the current generation of LLMs to generate false news… ▽ More

    Submitted 23 February, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

  6. arXiv:2311.08374  [pdf, other

    cs.CL

    A Ship of Theseus: Curious Cases of Paraphrasing in LLM-Generated Texts

    Authors: Nafis Irtiza Tripto, Saranya Venkatraman, Dominik Macko, Robert Moro, Ivan Srba, Adaku Uchendu, Thai Le, Dongwon Lee

    Abstract: In the realm of text manipulation and linguistic transformation, the question of authorship has been a subject of fascination and philosophical inquiry. Much like the Ship of Theseus paradox, which ponders whether a ship remains the same when each of its original planks is replaced, our research delves into an intriguing question: Does a text retain its original authorship when it undergoes numero… ▽ More

    Submitted 6 June, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

    Comments: To appear in Association for Computational Linguistics (ACL 2024)

  7. MULTITuDE: Large-Scale Multilingual Machine-Generated Text Detection Benchmark

    Authors: Dominik Macko, Robert Moro, Adaku Uchendu, Jason Samuel Lucas, Michiharu Yamashita, Matúš Pikuliak, Ivan Srba, Thai Le, Dongwon Lee, Jakub Simko, Maria Bielikova

    Abstract: There is a lack of research into capabilities of recent LLMs to generate convincing text in languages other than English and into performance of detectors of machine-generated text in multilingual settings. This is also reflected in the available benchmarks which lack authentic texts in languages other than English and predominantly cover older generators. To fill this gap, we introduce MULTITuDE,… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

    Journal ref: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

  8. arXiv:2310.06656  [pdf, other

    cs.AI cs.CR cs.NI

    Assessing the Impact of a Supervised Classification Filter on Flow-based Hybrid Network Anomaly Detection

    Authors: Dominik Macko, Patrik Goldschmidt, Peter Pištek, Daniela Chudá

    Abstract: Constant evolution and the emergence of new cyberattacks require the development of advanced techniques for defense. This paper aims to measure the impact of a supervised filter (classifier) in network anomaly detection. We perform our experiments by employing a hybrid anomaly detection approach in network flow data. For this purpose, we extended a state-of-the-art autoencoder-based anomaly detect… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

  9. arXiv:2205.05573  [pdf, other

    cs.CR

    A Longitudinal Study of Cryptographic API: a Decade of Android Malware

    Authors: Adam Janovsky, Davide Maiorca, Dominik Macko, Vashek Matyas, Giorgio Giacinto

    Abstract: Cryptography has been extensively used in Android applications to guarantee secure communications, conceal critical data from reverse engineering, or ensure mobile users' privacy. Various system-based and third-party libraries for Android provide cryptographic functionalities, and previous works mainly explored the misuse of cryptographic API in benign applications. However, the role of cryptograp… ▽ More

    Submitted 6 July, 2022; v1 submitted 11 May, 2022; originally announced May 2022.

    Comments: Fix processing time data