Skip to main content

Showing 1–7 of 7 results for author: Sido, J

.
  1. arXiv:2306.06196  [pdf, other

    cs.LG cs.AI eess.SP

    ElectroCardioGuard: Preventing Patient Misidentification in Electrocardiogram Databases through Neural Networks

    Authors: Michal Seják, Jakub Sido, David Žahour

    Abstract: Electrocardiograms (ECGs) are commonly used by cardiologists to detect heart-related pathological conditions. Reliable collections of ECGs are crucial for precise diagnosis. However, in clinical practice, the assignment of captured ECG recordings to incorrect patients can occur inadvertently. In collaboration with a clinical and research facility which recognized this challenge and reached out to… ▽ More

    Submitted 19 September, 2023; v1 submitted 9 June, 2023; originally announced June 2023.

    Comments: 28 pages, 5 figures, 8 tables

  2. arXiv:2209.07841  [pdf, other

    cs.CL

    Findings of the Shared Task on Multilingual Coreference Resolution

    Authors: Zdeněk Žabokrtský, Miloslav Konopík, Anna Nedoluzhko, Michal Novák, Maciej Ogrodniczuk, Martin Popel, Ondřej Pražák, Jakub Sido, Daniel Zeman, Yilun Zhu

    Abstract: This paper presents an overview of the shared task on multilingual coreference resolution associated with the CRAC 2022 workshop. Shared task participants were supposed to develop trainable systems capable of identifying mentions and clustering them according to identity coreference. The public edition of CorefUD 1.0, which contains 13 datasets for 10 languages, was used as the source of training… ▽ More

    Submitted 16 September, 2022; originally announced September 2022.

  3. arXiv:2203.14093  [pdf, other

    cs.CL cs.LG cs.PL cs.SE

    MQDD: Pre-training of Multimodal Question Duplicity Detection for Software Engineering Domain

    Authors: Jan Pašek, Jakub Sido, Miloslav Konopík, Ondřej Pražák

    Abstract: This work proposes a new pipeline for leveraging data collected on the Stack Overflow website for pre-training a multimodal model for searching duplicates on question answering websites. Our multimodal model is trained on question descriptions and source codes in multiple programming languages. We design two new learning objectives to improve duplicate detection capabilities. The result of this wo… ▽ More

    Submitted 29 March, 2022; v1 submitted 26 March, 2022; originally announced March 2022.

  4. arXiv:2108.08708  [pdf, other

    cs.CL cs.AI cs.LG cs.NE

    Czech News Dataset for Semantic Textual Similarity

    Authors: Jakub Sido, Michal Seják, Ondřej Pražák, Miloslav Konopík, Václav Moravec

    Abstract: This paper describes a novel dataset consisting of sentences with semantic similarity annotations. The data originate from the journalistic domain in the Czech language. We describe the process of collecting and annotating the data in detail. The dataset contains 138,556 human annotations divided into train and test sets. In total, 485 journalism students participated in the creation process. To i… ▽ More

    Submitted 21 January, 2022; v1 submitted 19 August, 2021; originally announced August 2021.

  5. arXiv:2107.12088  [pdf, other

    cs.CL

    Multilingual Coreference Resolution with Harmonized Annotations

    Authors: Ondřej Pražák, Miloslav Konopík, Jakub Sido

    Abstract: In this paper, we present coreference resolution experiments with a newly created multilingual corpus CorefUD. We focus on the following languages: Czech, Russian, Polish, German, Spanish, and Catalan. In addition to monolingual experiments, we combine the training data in multilingual experiments and train two joined models -- for Slavic languages and for all the languages together. We rely on an… ▽ More

    Submitted 3 September, 2021; v1 submitted 26 July, 2021; originally announced July 2021.

  6. arXiv:2103.13031  [pdf, other

    cs.CL

    Czert -- Czech BERT-like Model for Language Representation

    Authors: Jakub Sido, Ondřej Pražák, Pavel Přibáň, Jan Pašek, Michal Seják, Miloslav Konopík

    Abstract: This paper describes the training process of the first Czech monolingual language representation models based on BERT and ALBERT architectures. We pre-train our models on more than 340K of sentences, which is 50 times more than multilingual models that include Czech data. We outperform the multilingual models on 9 out of 11 datasets. In addition, we establish the new state-of-the-art results on ni… ▽ More

    Submitted 20 August, 2021; v1 submitted 24 March, 2021; originally announced March 2021.

    Comments: 13 pages

  7. arXiv:2012.00004  [pdf, other

    cs.CL

    UWB at SemEval-2020 Task 1: Lexical Semantic Change Detection

    Authors: Ondřej Pražák, Pavel Přibáň, Stephen Taylor, Jakub Sido

    Abstract: In this paper, we describe our method for the detection of lexical semantic change, i.e., word sense changes over time. We examine semantic differences between specific words in two corpora, chosen from different time periods, for English, German, Latin, and Swedish. Our method was created for the SemEval 2020 Task 1: \textit{Unsupervised Lexical Semantic Change Detection.} We ranked $1^{st}$ in S… ▽ More

    Submitted 30 November, 2020; originally announced December 2020.

    Comments: arXiv admin note: substantial text overlap with arXiv:2011.14678