Skip to main content

Showing 1–13 of 13 results for author: Martinc, M

.
  1. arXiv:2404.05281  [pdf, ps, other

    cs.CL

    Multi-Task Learning for Features Extraction in Financial Annual Reports

    Authors: Syrielle Montariol, Matej Martinc, Andraž Pelicon, Senja Pollak, Boshko Koloski, Igor Lončarski, Aljoša Valentinčič

    Abstract: For assessing various performance indicators of companies, the focus is shifting from strictly financial (quantitative) publicly disclosed information to qualitative (textual) information. This textual data can provide valuable weak signals, for example through stylistic features, which can complement the quantitative data on financial performance or on Environmental, Social and Governance (ESG) c… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: Accepted at MIDAS Workshop at ECML-PKDD 2022

  2. arXiv:2402.16596  [pdf, other

    cs.CL

    Semantic change detection for Slovene language: a novel dataset and an approach based on optimal transport

    Authors: Marko Pranjić, Kaja Dobrovoljc, Senja Pollak, Matej Martinc

    Abstract: In this paper, we focus on the detection of semantic changes in Slovene, a less resourced Slavic language with two million speakers. Detecting and tracking semantic changes provides insights into the evolution of the language caused by changes in society and culture. Recently, several systems have been proposed to aid in this study, but all depend on manually annotated gold standard datasets for e… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

    ACM Class: I.2.7

  3. arXiv:2301.06767  [pdf, other

    cs.CL

    The Recent Advances in Automatic Term Extraction: A survey

    Authors: Hanh Thi Hong Tran, Matej Martinc, Jaya Caporusso, Antoine Doucet, Senja Pollak

    Abstract: Automatic term extraction (ATE) is a Natural Language Processing (NLP) task that eases the effort of manually identifying terms from domain-specific corpora by providing a list of candidate terms. As units of knowledge in a specific field of expertise, extracted terms are not only beneficial for several terminographical tasks, but also support and improve several complex downstream tasks, e.g., in… ▽ More

    Submitted 17 January, 2023; originally announced January 2023.

    Comments: 25 pages,4 figures, 3 tables

    ACM Class: A.1

  4. Ensembling Transformers for Cross-domain Automatic Term Extraction

    Authors: Hanh Thi Hong Tran, Matej Martinc, Andraz Pelicon, Antoine Doucet, Senja Pollak

    Abstract: Automatic term extraction plays an essential role in domain language understanding and several natural language processing downstream tasks. In this paper, we propose a comparative study on the predictive power of Transformers-based pretrained language models toward term extraction in a multi-language cross-domain setting. Besides evaluating the ability of monolingual models to extract single- and… ▽ More

    Submitted 11 December, 2022; originally announced December 2022.

    Comments: 11 pages including references, 3 figures, 2 tables

    Journal ref: International Conference on Asian Digital Libraries (ICADL 2022)

  5. arXiv:2203.16885  [pdf

    cs.CL

    A bilingual approach to specialised adjectives through word embeddings in the karstology domain

    Authors: Larisa Grčić Simeunović, Matej Martinc, Špela Vintar

    Abstract: We present an experiment in extracting adjectives which express a specific semantic relation using word embeddings. The results of the experiment are then thoroughly analysed and categorised into groups of adjectives exhibiting formal or semantic similarity. The experiment and analysis are performed for English and Croatian in the domain of karstology using data sets and methods developed in the T… ▽ More

    Submitted 31 March, 2022; originally announced March 2022.

    Comments: The paper is published as part of TOTH 2020 proceedings (https://btk.univ-smb.fr/livres/toth-2020/)

  6. arXiv:2202.06650  [pdf, other

    cs.CL cs.LG

    Out of Thin Air: Is Zero-Shot Cross-Lingual Keyword Detection Better Than Unsupervised?

    Authors: Boshko Koloski, Senja Pollak, Blaž Škrlj, Matej Martinc

    Abstract: Keyword extraction is the task of retrieving words that are essential to the content of a given document. Researchers proposed various approaches to tackle this problem. At the top-most level, approaches are divided into ones that require training - supervised and ones that do not - unsupervised. In this study, we are interested in settings, where for a language under investigation, no training da… ▽ More

    Submitted 14 February, 2022; originally announced February 2022.

  7. arXiv:2102.00472  [pdf, other

    cs.CL

    Extending Neural Keyword Extraction with TF-IDF tagset matching

    Authors: Boshko Koloski, Senja Pollak, Blaž Škrlj, Matej Martinc

    Abstract: Keyword extraction is the task of identifying words (or multi-word expressions) that best describe a given document and serve in news portals to link articles of similar topics. In this work we develop and evaluate our methods on four novel data sets covering less represented, morphologically-rich languages in European news media industry (Croatian, Estonian, Latvian and Russian). First, we perfor… ▽ More

    Submitted 14 February, 2022; v1 submitted 31 January, 2021; originally announced February 2021.

    Comments: The final formatted version of this publication was published in Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation (EACL 2021), Online, April, 2021 and is available online at https://www.aclweb.org/anthology/2021.hackashop-1.4

  8. COVID-19 therapy target discovery with context-aware literature mining

    Authors: Matej Martinc, Blaž Škrlj, Sergej Pirkmajer, Nada Lavrač, Bojan Cestnik, Martin Marzidovšek, Senja Pollak

    Abstract: The abundance of literature related to the widespread COVID-19 pandemic is beyond manual inspection of a single expert. Development of systems, capable of automatically processing tens of thousands of scientific publications with the aim to enrich existing empirical evidence with literature-based associations is challenging and relevant. We propose a system for contextualization of empirical expre… ▽ More

    Submitted 9 November, 2020; v1 submitted 30 July, 2020; originally announced July 2020.

    Comments: Accepted to the 23rd International Conference on Discovery Science (DS 2020)

  9. TNT-KID: Transformer-based Neural Tagger for Keyword Identification

    Authors: Matej Martinc, Blaž Škrlj, Senja Pollak

    Abstract: With growing amounts of available textual data, development of algorithms capable of automatic analysis, categorization and summarization of these data has become a necessity. In this research we present a novel algorithm for keyword identification, i.e., an extraction of one or multi-word phrases representing key aspects of a given document, called Transformer-based Neural Tagger for Keyword IDen… ▽ More

    Submitted 30 November, 2021; v1 submitted 20 March, 2020; originally announced March 2020.

    Comments: Accepted to Natural Language Engineering journal

    Journal ref: Martinc, M., Škrlj, B., & Pollak, S. (2021). TNT-KID: Transformer-based neural tagger for keyword identification. Natural Language Engineering, 1-40. doi:10.1017/S1351324921000127

  10. Capturing Evolution in Word Usage: Just Add More Clusters?

    Authors: Matej Martinc, Syrielle Montariol, Elaine Zosa, Lidia Pivovarova

    Abstract: The way the words are used evolves through time, mirroring cultural or technological evolution of society. Semantic change detection is the task of detecting and analysing word evolution in textual data, even in short periods of time. In this paper we focus on a new set of methods relying on contextualised embeddings, a type of semantic modelling that revolutionised the NLP field recently. We leve… ▽ More

    Submitted 23 January, 2020; v1 submitted 18 January, 2020; originally announced January 2020.

    Journal ref: WWW 20 Companion Proceedings of the Web Conference 2020 (April 2020) p. 343-349

  11. arXiv:1912.01072  [pdf, other

    cs.CL

    Leveraging Contextual Embeddings for Detecting Diachronic Semantic Shift

    Authors: Matej Martinc, Petra Kralj Novak, Senja Pollak

    Abstract: We propose a new method that leverages contextual embeddings for the task of diachronic semantic shift detection by generating time specific word representations from BERT embeddings. The results of our experiments in the domain specific LiverpoolFC corpus suggest that the proposed method has performance comparable to the current state-of-the-art without requiring any time consuming domain adaptat… ▽ More

    Submitted 5 March, 2020; v1 submitted 2 December, 2019; originally announced December 2019.

    Comments: Accepted to Language Resources and Evaluation (LREC 2020)

  12. Supervised and Unsupervised Neural Approaches to Text Readability

    Authors: Matej Martinc, Senja Pollak, Marko Robnik-Šikonja

    Abstract: We present a set of novel neural supervised and unsupervised approaches for determining the readability of documents. In the unsupervised setting, we leverage neural language models, whereas in the supervised setting, three different neural classification architectures are tested. We show that the proposed neural unsupervised approach is robust, transferable across languages and allows adaptation… ▽ More

    Submitted 11 March, 2021; v1 submitted 26 July, 2019; originally announced July 2019.

    Comments: 39 pages, published in Computational Linguistic Journal

  13. tax2vec: Constructing Interpretable Features from Taxonomies for Short Text Classification

    Authors: Blaž Škrlj, Matej Martinc, Jan Kralj, Nada Lavrač, Senja Pollak

    Abstract: The use of background knowledge is largely unexploited in text classification tasks. This paper explores word taxonomies as means for constructing new semantic features, which may improve the performance and robustness of the learned classifiers. We propose tax2vec, a parallel algorithm for constructing taxonomy-based features, and demonstrate its use on six short text classification problems: pre… ▽ More

    Submitted 23 April, 2020; v1 submitted 1 February, 2019; originally announced February 2019.

    Comments: Accepted at CSL journal