Skip to main content

Showing 1–10 of 10 results for author: Pecher, B

.
  1. arXiv:2406.12471  [pdf, other

    cs.CL

    Fighting Randomness with Randomness: Mitigating Optimisation Instability of Fine-Tuning using Delayed Ensemble and Noisy Interpolation

    Authors: Branislav Pecher, Jan Cegin, Robert Belanec, Jakub Simko, Ivan Srba, Maria Bielikova

    Abstract: While fine-tuning of pre-trained language models generally helps to overcome the lack of labelled training samples, it also displays model performance instability. This instability mainly originates from randomness in initialisation or data shuffling. To address this, researchers either modify the training process or augment the available samples, which typically results in increased computational… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  2. arXiv:2402.12819  [pdf, other

    cs.CL cs.AI cs.LG

    Comparing Specialised Small and General Large Language Models on Text Classification: 100 Labelled Samples to Achieve Break-Even Performance

    Authors: Branislav Pecher, Ivan Srba, Maria Bielikova

    Abstract: When solving NLP tasks with limited labelled data, researchers can either use a general large language model without further update, or use a small number of labelled examples to tune a specialised smaller model. In this work, we address the research gap of how many labelled samples are required for the specialised small models to outperform general large models, while taking the performance varia… ▽ More

    Submitted 26 April, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

  3. arXiv:2402.12817  [pdf, other

    cs.CL cs.AI cs.LG

    On Sensitivity of Learning with Limited Labelled Data to the Effects of Randomness: Impact of Interactions and Systematic Choices

    Authors: Branislav Pecher, Ivan Srba, Maria Bielikova

    Abstract: While learning with limited labelled data can improve performance when the labels are lacking, it is also sensitive to the effects of uncontrolled randomness introduced by so-called randomness factors (e.g., varying order of data). We propose a method to systematically investigate the effects of randomness factors while taking the interactions between them into consideration. To measure the true e… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

  4. arXiv:2402.03038  [pdf, other

    cs.LG cs.AI cs.CL

    Automatic Combination of Sample Selection Strategies for Few-Shot Learning

    Authors: Branislav Pecher, Ivan Srba, Maria Bielikova, Joaquin Vanschoren

    Abstract: In few-shot learning, such as meta-learning, few-shot fine-tuning or in-context learning, the limited number of samples used to train a model have a significant impact on the overall success. Although a large number of sample selection strategies exist, their impact on the performance of few-shot learning is not extensively known, as most of them have been so far evaluated in typical supervised se… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

  5. arXiv:2401.06643  [pdf, other

    cs.CL

    Effects of diversity incentives on sample diversity and downstream model performance in LLM-based text augmentation

    Authors: Jan Cegin, Branislav Pecher, Jakub Simko, Ivan Srba, Maria Bielikova, Peter Brusilovsky

    Abstract: The latest generative large language models (LLMs) have found their application in data augmentation tasks, where small numbers of text samples are LLM-paraphrased and then used to fine-tune downstream models. However, more research is needed to assess how different prompts, seed data selection strategies, filtering methods, or model settings affect the quality of paraphrased data (and downstream… ▽ More

    Submitted 15 February, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

    Comments: 24 pages, updated with new experimets - Mistral as downstream task classifier and new method combination (of taboo and hints methods)

  6. arXiv:2312.01082  [pdf, other

    cs.LG cs.AI cs.CL

    On the Effects of Randomness on Stability of Learning with Limited Labelled Data: A Systematic Literature Review

    Authors: Branislav Pecher, Ivan Srba, Maria Bielikova

    Abstract: Learning with limited labelled data, such as few-shot learning, meta-learning or transfer learning, aims to effectively train a model using only small amount of labelled samples. However, these approaches were observed to be excessively sensitive to the effects of uncontrolled randomness caused by non-determinism in the training process. The randomness negatively affects the stability of the model… ▽ More

    Submitted 2 December, 2023; originally announced December 2023.

  7. KInITVeraAI at SemEval-2023 Task 3: Simple yet Powerful Multilingual Fine-Tuning for Persuasion Techniques Detection

    Authors: Timo Hromadka, Timotej Smolen, Tomas Remis, Branislav Pecher, Ivan Srba

    Abstract: This paper presents the best-performing solution to the SemEval 2023 Task 3 on the subtask 3 dedicated to persuasion techniques detection. Due to a high multilingual character of the input data and a large number of 23 predicted labels (causing a lack of labelled data for some language-label combinations), we opted for fine-tuning pre-trained transformer-based language models. Conducting multiple… ▽ More

    Submitted 24 April, 2023; originally announced April 2023.

    Comments: System paper within SemEval 2023 Task 3 on the subtask 3

    Journal ref: Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

  8. arXiv:2210.10085  [pdf, other

    cs.IR cs.LG cs.SI

    Auditing YouTube's Recommendation Algorithm for Misinformation Filter Bubbles

    Authors: Ivan Srba, Robert Moro, Matus Tomlein, Branislav Pecher, Jakub Simko, Elena Stefancova, Michal Kompan, Andrea Hrckova, Juraj Podrouzek, Adrian Gavornik, Maria Bielikova

    Abstract: In this paper, we present results of an auditing study performed over YouTube aimed at investigating how fast a user can get into a misinformation filter bubble, but also what it takes to "burst the bubble", i.e., revert the bubble enclosure. We employ a sock puppet audit methodology, in which pre-programmed agents (acting as YouTube users) delve into misinformation filter bubbles by watching misi… ▽ More

    Submitted 18 October, 2022; originally announced October 2022.

    Comments: Just accepted to ACM Transactions on Recommender Systems (ACM TORS). arXiv admin note: substantial text overlap with arXiv:2203.13769

    Journal ref: ACM Transactions on Recommender Systems. 1, 1, Article 6 (March 2023), 33 pages

  9. arXiv:2204.12294  [pdf, other

    cs.CL cs.CY cs.IR cs.LG

    Monant Medical Misinformation Dataset: Map** Articles to Fact-Checked Claims

    Authors: Ivan Srba, Branislav Pecher, Matus Tomlein, Robert Moro, Elena Stefancova, Jakub Simko, Maria Bielikova

    Abstract: False information has a significant negative influence on individuals as well as on the whole society. Especially in the current COVID-19 era, we witness an unprecedented growth of medical misinformation. To help tackle this problem with machine learning approaches, we are publishing a feature-rich dataset of approx. 317k medical news articles/blogs and 3.5k fact-checked claims. It also contains 5… ▽ More

    Submitted 26 April, 2022; originally announced April 2022.

    Comments: 11 pages, 4 figures, SIGIR 2022 Resource paper track

    Journal ref: ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2022)

  10. An Audit of Misinformation Filter Bubbles on YouTube: Bubble Bursting and Recent Behavior Changes

    Authors: Matus Tomlein, Branislav Pecher, Jakub Simko, Ivan Srba, Robert Moro, Elena Stefancova, Michal Kompan, Andrea Hrckova, Juraj Podrouzek, Maria Bielikova

    Abstract: The negative effects of misinformation filter bubbles in adaptive systems have been known to researchers for some time. Several studies investigated, most prominently on YouTube, how fast a user can get into a misinformation filter bubble simply by selecting wrong choices from the items offered. Yet, no studies so far have investigated what it takes to burst the bubble, i.e., revert the bubble enc… ▽ More

    Submitted 25 March, 2022; originally announced March 2022.

    Comments: RecSys '21: Fifteenth ACM Conference on Recommender System

    Journal ref: RecSys '21: Fifteenth ACM Conference on Recommender Systems, 2021