Skip to main content

Showing 1–23 of 23 results for author: Rosa, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.13057  [pdf, other

    cs.SE cs.AI

    Can Github issues be solved with Tree Of Thoughts?

    Authors: Ricardo La Rosa, Corey Hulse, Bangdi Liu

    Abstract: While there have been extensive studies in code generation by large language models (LLM), where benchmarks like HumanEval have been surpassed with an impressive 96.3% success rate, these benchmarks predominantly judge a model's performance on basic function-level code generation and lack the critical thinking and concept of scope required of real-world scenarios such as solving GitHub issues. Thi… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: 8 pages, 2 figures, 7 tables

  2. arXiv:2404.08974  [pdf, other

    cs.CL

    OOVs in the Spotlight: How to Inflect them?

    Authors: Tomáš Sourada, Jana Straková, Rudolf Rosa

    Abstract: We focus on morphological inflection in out-of-vocabulary (OOV) conditions, an under-researched subtask in which state-of-the-art systems usually are less effective. We developed three systems: a retrograde model and two sequence-to-sequence (seq2seq) models based on LSTM and Transformer. For testing in OOV conditions, we automatically extracted a large dataset of nouns in the morphologically rich… ▽ More

    Submitted 28 May, 2024; v1 submitted 13 April, 2024; originally announced April 2024.

    Comments: Published in the proceedings of LREC-COLING 2024. 12 pages, 3 figures

    Journal ref: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pp. 12455-12466

  3. AsQM: Audio streaming Quality Metric based on Network Impairments and User Preferences

    Authors: Marcelo Rodrigo dos Santos, Andreza Patrícia Batista, Renata Lopes Rosa, Muhammad Saadi, Dick Carrillo Melgarejo, Demóstenes Zegarra Rodríguez

    Abstract: There are many users of audio streaming services because of the proliferation of cloud-based audio streaming services for different content. The complex networks that support these services do not always guarantee an acceptable quality on the end-user side. In this paper, the impact of temporal interruptions on the reproduction of audio streaming and the users preference in relation to audio conte… ▽ More

    Submitted 26 September, 2023; originally announced September 2023.

    Comments: 11 pages

    Journal ref: IEEE Transactions on Consumer Electronics, vol. 69, no. 3, pp. 408-420, Aug. 2023

  4. arXiv:2308.06266  [pdf, other

    cs.HC

    $n$ Walks in the Fictional Woods

    Authors: Victor Schetinger, Sara Di Bartolomeo, Edirlei Soares de Lima, Christofer Meinecke, Rudolf Rosa

    Abstract: This paper presents a novel exploration of the interaction between generative AI models, visualization, and narrative generation processes, using OpenAI's GPT as a case study. We look at the question "Where Does Generativeness Comes From", which has a simple answer at the intersection of many domains. Drawing on Umberto Eco's "Six Walks in the Fictional Woods", we engender a speculative, transdisc… ▽ More

    Submitted 23 August, 2023; v1 submitted 13 July, 2023; originally announced August 2023.

    Comments: this is a submission for IEEE alt.vis 2023

  5. arXiv:2210.15506  [pdf, other

    quant-ph cs.PL

    Programming with Quantum Mechanics

    Authors: Evandro C. R. da Rosa, Claudio Lima

    Abstract: Quantum computing is an emerging paradigm that opens a new era for exponential computational speedup. Still, quantum computers have yet to be ready for commercial use. However, it is essential to train and qualify today the workforce that will develop quantum acceleration solutions to get the quantum advantage in the future. This tutorial gives a broad view of quantum computing, abstracting most o… ▽ More

    Submitted 27 October, 2022; originally announced October 2022.

  6. arXiv:2206.08425  [pdf, other

    cs.CL

    DialogueScript: Using Dialogue Agents to Produce a Script

    Authors: Patrícia Schmidtová, Dávid Javorský, Christián Mikláš, Tomáš Musil, Rudolf Rosa, Ondřej Dušek

    Abstract: We present a novel approach to generating scripts by using agents with different personality types. To manage character interaction in the script, we employ simulated dramatic networks. Automatic and human evaluation on multiple criteria shows that our approach outperforms a vanilla-GPT2-based baseline. We further introduce a new metric to evaluate dialogue consistency based on natural language in… ▽ More

    Submitted 16 June, 2022; originally announced June 2022.

    Comments: Non-archival paper at the 4th Workshop on Narrative Understanding (WNU 2022)

  7. arXiv:2102.08892  [pdf, ps, other

    cs.CL cs.HC

    THEaiTRE 1.0: Interactive generation of theatre play scripts

    Authors: Rudolf Rosa, Tomáš Musil, Ondřej Dušek, Dominik Jurko, Patrícia Schmidtová, David Mareček, Ondřej Bojar, Tom Kocmi, Daniel Hrbek, David Košťák, Martina Kinská, Marie Nováková, Josef Doležal, Klára Vosecká, Tomáš Studeník, Petr Žabka

    Abstract: We present the first version of a system for interactive generation of theatre play scripts. The system is based on a vanilla GPT-2 model with several adjustments, targeting specific issues we encountered in practice. We also list other issues we encountered but plan to only solve in a future version of the system. The presented system was used to generate a theatre play script planned for premier… ▽ More

    Submitted 17 February, 2021; originally announced February 2021.

    Comments: Submitted to Text2Story workshop 2021

    Journal ref: Proc. Text2Story (2021) 71-76

  8. Predicting Typological Features in WALS using Language Embeddings and Conditional Probabilities: ÚFAL Submission to the SIGTYP 2020 Shared Task

    Authors: Martin Vastl, Daniel Zeman, Rudolf Rosa

    Abstract: We present our submission to the SIGTYP 2020 Shared Task on the prediction of typological features. We submit a constrained system, predicting typological features only based on the WALS database. We investigate two approaches. The simpler of the two is a system based on estimating correlation of feature values within languages by computing conditional probabilities and mutual information. The sec… ▽ More

    Submitted 8 October, 2020; originally announced October 2020.

    Journal ref: Proc. SIGTYP Workshop on Computational Research in Linguistic Typology (2020) 29-35

  9. Measuring Memorization Effect in Word-Level Neural Networks Probing

    Authors: Rudolf Rosa, Tomáš Musil, David Mareček

    Abstract: Multiple studies have probed representations emerging in neural networks trained for end-to-end NLP tasks and examined what word-level linguistic information may be encoded in the representations. In classical probing, a classifier is trained on the representations to extract the target linguistic information. However, there is a threat of the classifier simply memorizing the linguistic labels for… ▽ More

    Submitted 29 June, 2020; originally announced June 2020.

    Comments: Accepted to TSD 2020. Will be published in Springer LNCS

    Journal ref: LNCS 12284, TSD (2020) 180-188

  10. arXiv:2006.14668  [pdf, ps, other

    cs.CL

    THEaiTRE: Artificial Intelligence to Write a Theatre Play

    Authors: Rudolf Rosa, Ondřej Dušek, Tom Kocmi, David Mareček, Tomáš Musil, Patrícia Schmidtová, Dominik Jurko, Ondřej Bojar, Daniel Hrbek, David Košťák, Martina Kinská, Josef Doležal, Klára Vosecká

    Abstract: We present THEaiTRE, a starting project aimed at automatic generation of theatre play scripts. This paper reviews related work and drafts an approach we intend to follow. We plan to adopt generative neural language models and hierarchical generation approaches, supported by summarization and machine translation methods, and complemented with a human-in-the-loop approach.

    Submitted 25 June, 2020; originally announced June 2020.

    Comments: accepted to AI4Narratives2020

    Journal ref: Proc. AI4Narratives (2020) 9-13

  11. arXiv:2006.00131  [pdf, other

    quant-ph cs.PL

    Classical and Quantum Data Interaction in Programming Languages: A Runtime Architecture

    Authors: Evandro Chagas Ribeiro da Rosa, Rafael de Santiago

    Abstract: We propose a runtime architecture that can be used in the development of a quantum programming language and its programming environment. The proposed runtime architecture enables dynamic interaction between classical and quantum data following the restriction that a quantum computer is available in the cloud as a batch computer, with no interaction with the classical computer during its execution.… ▽ More

    Submitted 29 May, 2020; originally announced June 2020.

  12. Universal Dependencies according to BERT: both more specific and more general

    Authors: Tomasz Limisiewicz, Rudolf Rosa, David Mareček

    Abstract: This work focuses on analyzing the form and extent of syntactic abstraction captured by BERT by extracting labeled dependency trees from self-attentions. Previous work showed that individual BERT heads tend to encode particular dependency relation types. We extend these findings by explicitly comparing BERT relations to Universal Dependencies (UD) annotations, showing that they often do not matc… ▽ More

    Submitted 6 October, 2020; v1 submitted 30 April, 2020; originally announced April 2020.

    Journal ref: Findings of the Association for Computational Linguistics: EMNLP 2020

  13. arXiv:2004.05160  [pdf, other

    cs.CL

    On the Language Neutrality of Pre-trained Multilingual Representations

    Authors: **dřich Libovický, Rudolf Rosa, Alexander Fraser

    Abstract: Multilingual contextual embeddings, such as multilingual BERT and XLM-RoBERTa, have proved useful for many multi-lingual tasks. Previous work probed the cross-linguality of the representations indirectly using zero-shot transfer learning on morphological and syntactic tasks. We instead investigate the language-neutrality of multilingual contextual embeddings directly and with respect to lexical se… ▽ More

    Submitted 29 September, 2020; v1 submitted 9 April, 2020; originally announced April 2020.

    Comments: 12 pages, 3 figures. arXiv admin note: text overlap with arXiv:1911.03310. Accepted to Findings of EMNLP 2020

  14. arXiv:1911.03310  [pdf, other

    cs.CL

    How Language-Neutral is Multilingual BERT?

    Authors: **dřich Libovický, Rudolf Rosa, Alexander Fraser

    Abstract: Multilingual BERT (mBERT) provides sentence representations for 104 languages, which are useful for many multi-lingual tasks. Previous work probed the cross-linguality of mBERT using zero-shot transfer learning on morphological and syntactic tasks. We instead focus on the semantic properties of mBERT. We show that mBERT representations can be split into a language-specific component and a language… ▽ More

    Submitted 8 November, 2019; originally announced November 2019.

    Comments: 6 pages, 3 figures

  15. arXiv:1908.08528  [pdf, ps, other

    cs.CL

    Unsupervised Lemmatization as Embeddings-Based Word Clustering

    Authors: Rudolf Rosa, Zdeněk Žabokrtský

    Abstract: We focus on the task of unsupervised lemmatization, i.e. grou** together inflected forms of one word under one label (a lemma) without the use of annotated training data. We propose to perform agglomerative clustering of word forms with a novel distance measure. Our distance measure is based on the observation that inflections of the same word tend to be similar both string-wise and in meaning.… ▽ More

    Submitted 22 August, 2019; originally announced August 2019.

  16. arXiv:1906.11511  [pdf, other

    cs.CL

    Inducing Syntactic Trees from BERT Representations

    Authors: Rudolf Rosa, David Mareček

    Abstract: We use the English model of BERT and explore how a deletion of one word in a sentence changes representations of other words. Our hypothesis is that removing a reducible word (e.g. an adjective) does not affect the representation of other words so much as removing e.g. the main verb, which makes the sentence ungrammatical and of "high surprise" for the language model. We estimate reducibilities of… ▽ More

    Submitted 27 June, 2019; originally announced June 2019.

    Comments: Accepted abstract for the BlackboxNLP 2019

  17. arXiv:1906.01958  [pdf, other

    cs.CL cs.LG

    From Balustrades to Pierre Vinken: Looking for Syntax in Transformer Self-Attentions

    Authors: David Mareček, Rudolf Rosa

    Abstract: We inspect the multi-head self-attention in Transformer NMT encoders for three source languages, looking for patterns that could have a syntactic interpretation. In many of the attention heads, we frequently find sequences of consecutive states attending to the same position, which resemble syntactic phrases. We propose a transparent deterministic method of quantifying the amount of syntactic info… ▽ More

    Submitted 5 June, 2019; originally announced June 2019.

    Comments: Accepted at BlackboxNLP 2019

  18. Network Service Orchestration: A Survey

    Authors: Nathan F. Saraiva de Sousa, Danny A. Lachos Perez, Raphael V. Rosa, Mateus A. S. Santos, Christian Esteve Rothenberg

    Abstract: Business models of network service providers are undergoing an evolving transformation fueled by vertical customer demands and technological advances such as 5G, Software Defined Networking~(SDN), and Network Function Virtualization~(NFV). Emerging scenarios call for agile network services consuming network, storage, and compute resources across heterogeneous infrastructures and administrative dom… ▽ More

    Submitted 17 May, 2019; v1 submitted 17 March, 2018; originally announced March 2018.

    Comments: Accepted for publication at Computer Communications Journal

  19. arXiv:1604.03278  [pdf, other

    stat.ML cs.LG

    Confidence Decision Trees via Online and Active Learning for Streaming (BIG) Data

    Authors: Rocco De Rosa

    Abstract: Decision tree classifiers are a widely used tool in data stream mining. The use of confidence intervals to estimate the gain associated with each split leads to very effective methods, like the popular Hoeffding tree algorithm. From a statistical viewpoint, the analysis of decision tree classifiers in a streaming setting requires knowing when enough new information has been collected to justify sp… ▽ More

    Submitted 12 April, 2016; originally announced April 2016.

  20. arXiv:1604.02855  [pdf, other

    stat.ML cs.CV cs.LG

    Active Learning for Online Recognition of Human Activities from Streaming Videos

    Authors: Rocco De Rosa, Ilaria Gori, Fabio Cuzzolin, Barbara Caputo, Nicolò Cesa-Bianchi

    Abstract: Recognising human activities from streaming videos poses unique challenges to learning algorithms: predictive models need to be scalable, incrementally trainable, and must remain bounded in size even when the data stream is arbitrarily long. Furthermore, as parameter tuning is problematic in a streaming setting, suitable approaches should be parameterless, and make no assumptions on what class lab… ▽ More

    Submitted 11 April, 2016; originally announced April 2016.

  21. arXiv:1604.02275  [pdf, other

    cs.CV cs.LG stat.ML

    Online Open World Recognition

    Authors: Rocco De Rosa, Thomas Mensink, Barbara Caputo

    Abstract: As we enter into the big data age and an avalanche of images have become readily available, recognition systems face the need to move from close, lab settings where the number of classes and training data are fixed, to dynamic scenarios where the number of categories to be recognized grows continuously over time, as well as new data providing useful information to update the system. Recent attempt… ▽ More

    Submitted 8 April, 2016; originally announced April 2016.

    Comments: keywords{Open world recognition, Open set, Incremental Learning, Metric Learning, Nonparametric methods, Classification confidence}

  22. arXiv:1508.04912  [pdf, other

    stat.ML cs.LG

    The ABACOC Algorithm: a Novel Approach for Nonparametric Classification of Data Streams

    Authors: Rocco De Rosa, Francesco Orabona, Nicolò Cesa-Bianchi

    Abstract: Stream mining poses unique challenges to machine learning: predictive models are required to be scalable, incrementally trainable, must remain bounded in size (even when the data stream is arbitrarily long), and be nonparametric in order to achieve high accuracy even in complex and dynamic environments. Moreover, the learning system must be parameterless ---traditional tuning methods are problemat… ▽ More

    Submitted 20 August, 2015; originally announced August 2015.

  23. arXiv:1506.04897  [pdf, ps, other

    cs.CL

    Parsing Natural Language Sentences by Semi-supervised Methods

    Authors: Rudolf Rosa

    Abstract: We present our work on semi-supervised parsing of natural language sentences, focusing on multi-source crosslingual transfer of delexicalized dependency parsers. We first evaluate the influence of treebank annotation styles on parsing performance, focusing on adposition attachment style. Then, we present KLcpos3, an empirical language similarity measure, designed and tuned for source parser weight… ▽ More

    Submitted 16 June, 2015; originally announced June 2015.

    Comments: Dissertation interim report. Overlap with papers accepted to ACL 2015 and Depling 2015, and a paper under review at IWPT 2015

    Report number: 3039210042125978224 ACM Class: I.2.7