Skip to main content

Showing 1–11 of 11 results for author: Kocoń, J

.
  1. arXiv:2404.05892  [pdf, other

    cs.CL cs.AI

    Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence

    Authors: Bo Peng, Daniel Goldstein, Quentin Anthony, Alon Albalak, Eric Alcaide, Stella Biderman, Eugene Cheah, Xingjian Du, Teddy Ferdinan, Haowen Hou, Przemysław Kazienko, Kranthi Kiran GV, Jan Kocoń, Bartłomiej Koptyra, Satyapriya Krishna, Ronald McClelland Jr., Niklas Muennighoff, Fares Obeid, Atsushi Saito, Guangyu Song, Haoqin Tu, Stanisław Woźniak, Ruichong Zhang, Bingchen Zhao, Qihang Zhao , et al. (3 additional authors not shown)

    Abstract: We present Eagle (RWKV-5) and Finch (RWKV-6), sequence models improving upon the RWKV (RWKV-4) architecture. Our architectural design advancements include multi-headed matrix-valued states and a dynamic recurrence mechanism that improve expressivity while maintaining the inference efficiency characteristics of RNNs. We introduce a new multilingual corpus with 1.12 trillion tokens and a fast tokeni… ▽ More

    Submitted 10 April, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

  2. arXiv:2402.09269  [pdf, other

    cs.CL cs.AI

    Personalized Large Language Models

    Authors: Stanisław Woźniak, Bartłomiej Koptyra, Arkadiusz Janz, Przemysław Kazienko, Jan Kocoń

    Abstract: Large language models (LLMs) have significantly advanced Natural Language Processing (NLP) tasks in recent years. However, their universal nature poses limitations in scenarios requiring personalized responses, such as recommendation systems and chatbots. This paper investigates methods to personalize LLMs, comparing fine-tuning and zero-shot reasoning approaches on subjective tasks. Results demon… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

  3. arXiv:2402.09147  [pdf, other

    cs.AI

    Into the Unknown: Self-Learning Large Language Models

    Authors: Teddy Ferdinan, Jan Kocoń, Przemysław Kazienko

    Abstract: We address the main problem of self-learning LLM: the question of what to learn. We propose a self-learning LLM framework that enables an LLM to independently learn previously unknown knowledge through selfassessment of their own hallucinations. Using the hallucination score, we introduce a new concept of Points in the Unknown (PiUs), along with one extrinsic and three intrinsic methods for automa… ▽ More

    Submitted 4 June, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

    Comments: 16 pages, 12 figures, 4 tables, submitted to ACL SRW 2024

  4. arXiv:2312.08198  [pdf, other

    cs.CL cs.AI

    Towards Model-Based Data Acquisition for Subjective Multi-Task NLP Problems

    Authors: Kamil Kanclerz, Julita Bielaniewicz, Marcin Gruza, Jan Kocon, Stanisław Woźniak, Przemysław Kazienko

    Abstract: Data annotated by humans is a source of knowledge by describing the peculiarities of the problem and therefore fueling the decision process of the trained model. Unfortunately, the annotation process for subjective natural language processing (NLP) problems like offensiveness or emotion detection is often very expensive and time-consuming. One of the inevitable risks is to spend some of the funds… ▽ More

    Submitted 13 December, 2023; originally announced December 2023.

  5. arXiv:2312.06034  [pdf, other

    cs.AI

    Modeling Uncertainty in Personalized Emotion Prediction with Normalizing Flows

    Authors: Piotr Miłkowski, Konrad Karanowski, Patryk Wielopolski, Jan Kocoń, Przemysław Kazienko, Maciej Zięba

    Abstract: Designing predictive models for subjective problems in natural language processing (NLP) remains challenging. This is mainly due to its non-deterministic nature and different perceptions of the content by different humans. It may be solved by Personalized Natural Language Processing (PNLP), where the model exploits additional information about the reader to make more accurate predictions. However,… ▽ More

    Submitted 10 December, 2023; originally announced December 2023.

    Comments: 10 pages, 8 figures, SENTIRE'23 (ICDM 2023)

  6. arXiv:2312.04720  [pdf, other

    cs.CL cs.AI cs.LG

    From Big to Small Without Losing It All: Text Augmentation with ChatGPT for Efficient Sentiment Analysis

    Authors: Stanisław Woźniak, Jan Kocoń

    Abstract: In the era of artificial intelligence, data is gold but costly to annotate. The paper demonstrates a groundbreaking solution to this dilemma using ChatGPT for text augmentation in sentiment analysis. We leverage ChatGPT's generative capabilities to create synthetic training data that significantly improves the performance of smaller models, making them competitive with, or even outperforming, thei… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

    Comments: 10 pages, 9 figures, presented at ICDM Workshop: SENTIRE 2023

  7. arXiv:2312.04715  [pdf, other

    cs.CL cs.AI cs.LG

    Deep Emotions Across Languages: A Novel Approach for Sentiment Propagation in Multilingual WordNets

    Authors: Jan Kocoń

    Abstract: Sentiment analysis involves using WordNets enriched with emotional metadata, which are valuable resources. However, manual annotation is time-consuming and expensive, resulting in only a few WordNet Lexical Units being annotated. This paper introduces two new techniques for automatically propagating sentiment annotations from a partially annotated WordNet to its entirety and to a WordNet in a diff… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

    Comments: 6 pages, 1 figure, presented at ICDM Workshop: SENTIRE 2023

  8. arXiv:2305.13048  [pdf, other

    cs.CL cs.AI

    RWKV: Reinventing RNNs for the Transformer Era

    Authors: Bo Peng, Eric Alcaide, Quentin Anthony, Alon Albalak, Samuel Arcadinho, Stella Biderman, Huanqi Cao, Xin Cheng, Michael Chung, Matteo Grella, Kranthi Kiran GV, Xuzheng He, Haowen Hou, Jiaju Lin, Przemyslaw Kazienko, Jan Kocon, Jiaming Kong, Bartlomiej Koptyra, Hayden Lau, Krishna Sri Ipsit Mantri, Ferdinand Mom, Atsushi Saito, Guangyu Song, Xiangru Tang, Bolun Wang , et al. (9 additional authors not shown)

    Abstract: Transformers have revolutionized almost all natural language processing (NLP) tasks but suffer from memory and computational complexity that scales quadratically with sequence length. In contrast, recurrent neural networks (RNNs) exhibit linear scaling in memory and computational requirements but struggle to match the same performance as Transformers due to limitations in parallelization and scala… ▽ More

    Submitted 10 December, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

  9. arXiv:2302.10724  [pdf, other

    cs.CL cs.AI cs.CY cs.LG

    ChatGPT: Jack of all trades, master of none

    Authors: Jan Kocoń, Igor Cichecki, Oliwier Kaszyca, Mateusz Kochanek, Dominika Szydło, Joanna Baran, Julita Bielaniewicz, Marcin Gruza, Arkadiusz Janz, Kamil Kanclerz, Anna Kocoń, Bartłomiej Koptyra, Wiktoria Mieleszczenko-Kowszewicz, Piotr Miłkowski, Marcin Oleksy, Maciej Piasecki, Łukasz Radliński, Konrad Wojtasik, Stanisław Woźniak, Przemysław Kazienko

    Abstract: OpenAI has released the Chat Generative Pre-trained Transformer (ChatGPT) and revolutionized the approach in artificial intelligence to human-model interaction. Several publications on ChatGPT evaluation test its effectiveness on well-known natural language processing (NLP) tasks. However, the existing studies are mostly non-automated and tested on a very limited scale. In this work, we examined C… ▽ More

    Submitted 9 June, 2023; v1 submitted 21 February, 2023; originally announced February 2023.

    Comments: preprint

    Journal ref: Information Fusion 101861 (2023)

  10. arXiv:2206.04615  [pdf, other

    cs.CL cs.AI cs.CY cs.LG stat.ML

    Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

    Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

    Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More

    Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

    Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

  11. arXiv:1904.04055  [pdf, ps, other

    cs.CL cs.LG stat.ML

    Evaluating KGR10 Polish word embeddings in the recognition of temporal expressions using BiLSTM-CRF

    Authors: Jan Kocoń, Michał Gawor

    Abstract: The article introduces a new set of Polish word embeddings, built using KGR10 corpus, which contains more than 4 billion words. These embeddings are evaluated in the problem of recognition of temporal expressions (timexes) for the Polish language. We described the process of KGR10 corpus creation and a new approach to the recognition problem using Bidirectional Long-Short Term Memory (BiLSTM) netw… ▽ More

    Submitted 3 April, 2019; originally announced April 2019.

    Comments: Presented at TFML 2019 (Theoretical Foundations of Machine Learning)