Skip to main content

Showing 1–22 of 22 results for author: Milosevic, N

.
  1. arXiv:2407.05015  [pdf, other

    cs.CL cs.AI

    How do you know that? Teaching Generative Language Models to Reference Answers to Biomedical Questions

    Authors: Bojana Bašaragin, Adela Ljajić, Darija Medvecki, Lorenzo Cassano, Miloš Košprdić, Nikola Milošević

    Abstract: Large language models (LLMs) have recently become the leading source of answers for users' questions online. Despite their ability to offer eloquent answers, their accuracy and reliability can pose a significant challenge. This is especially true for sensitive domains such as biomedicine, where there is a higher need for factually correct answers. This paper introduces a biomedical retrieval-augme… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

    Comments: Accepted at BioNLP Workshop 2024, colocated with ACL 2024

  2. arXiv:2406.03845  [pdf, other

    cs.LG cs.RO eess.SY

    Open Problem: Active Representation Learning

    Authors: Nikola Milosevic, Gesine Müller, Jan Huisken, Nico Scherf

    Abstract: In this work, we introduce the concept of Active Representation Learning, a novel class of problems that intertwines exploration and representation learning within partially observable environments. We extend ideas from Active Simultaneous Localization and Map** (active SLAM), and translate them to scientific discovery problems, exemplified by adaptive microscopy. We explore the need for a frame… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  3. arXiv:2402.18589  [pdf, other

    cs.IR cs.AI cs.CL cs.LG

    Verif.ai: Towards an Open-Source Scientific Generative Question-Answering System with Referenced and Verifiable Answers

    Authors: Miloš Košprdić, Adela Ljajić, Bojana Bašaragin, Darija Medvecki, Nikola Milošević

    Abstract: In this paper, we present the current progress of the project Verif.ai, an open-source scientific generative question-answering system with referenced and verified answers. The components of the system are (1) an information retrieval system combining semantic and lexical search techniques over scientific papers (PubMed), (2) a fine-tuned generative model (Mistral 7B) taking top answers and genera… ▽ More

    Submitted 9 February, 2024; originally announced February 2024.

    Comments: Accepted as a short paper at The Sixteenth International Conference on Evolving Internet (INTERNET 2024)

    Journal ref: The Sixteenth International Conference on Evolving Internet (INTERNET 2024)

  4. Multilingual transformer and BERTopic for short text topic modeling: The case of Serbian

    Authors: Darija Medvecki, Bojana Bašaragin, Adela Ljajić, Nikola Milošević

    Abstract: This paper presents the results of the first application of BERTopic, a state-of-the-art topic modeling technique, to short text written in a morphologi-cally rich language. We applied BERTopic with three multilingual embed-ding models on two levels of text preprocessing (partial and full) to evalu-ate its performance on partially preprocessed short text in Serbian. We also compared it to LDA and… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

    Journal ref: Trajanovic, M., Filipovic, N., Zdravkovic, M. (eds) Disruptive Information Technologies for a Smart Society. ICIST 2023. Lecture Notes in Networks and Systems, vol 872. Springer, Cham

  5. arXiv:2312.03736  [pdf

    cs.CL cs.AI cs.CR cs.DL cs.LG

    De-identification of clinical free text using natural language processing: A systematic review of current approaches

    Authors: Aleksandar Kovačević, Bojana Bašaragin, Nikola Milošević, Goran Nenadić

    Abstract: Background: Electronic health records (EHRs) are a valuable resource for data-driven medical research. However, the presence of protected health information (PHI) makes EHRs unsuitable to be shared for research purposes. De-identification, i.e. the process of removing PHI is a critical step in making EHR data accessible. Natural language processing has repeatedly demonstrated its feasibility in au… ▽ More

    Submitted 28 November, 2023; originally announced December 2023.

    Comments: Submitted to Artificial Intelligence in Medicine

    Journal ref: Artificial Intelligence in Medicine, Volume 151, May 2024

  6. arXiv:2305.04928  [pdf, other

    cs.CL cs.AI

    From Zero to Hero: Harnessing Transformers for Biomedical Named Entity Recognition in Zero- and Few-shot Contexts

    Authors: Miloš Košprdić, Nikola Prodanović, Adela Ljajić, Bojana Bašaragin, Nikola Milošević

    Abstract: Supervised named entity recognition (NER) in the biomedical domain depends on large sets of annotated texts with the given named entities. The creation of such datasets can be time-consuming and expensive, while extraction of new entities requires additional annotation tasks and retraining the model. To address these challenges, this paper proposes a method for zero- and few-shot NER in the biomed… ▽ More

    Submitted 25 January, 2024; v1 submitted 5 May, 2023; originally announced May 2023.

    Comments: Collaboration between Bayer Pharma R&D and Serbian Institute for Artificial Intelligence Research and Development

  7. arXiv:2304.05468  [pdf, ps, other

    cs.CL cs.DL cs.HC

    A Survey of Resources and Methods for Natural Language Processing of Serbian Language

    Authors: Ulfeta A. Marovac, Aldina R. Avdić, Nikola Lj. Milošević

    Abstract: The Serbian language is a Slavic language spoken by over 12 million speakers and well understood by over 15 million people. In the area of natural language processing, it can be considered a low-resourced language. Also, Serbian is considered a high-inflectional language. The combination of many word inflections and low availability of language resources makes natural language processing of Serbia… ▽ More

    Submitted 11 April, 2023; originally announced April 2023.

    Comments: 43 pages, submitted to Artificial Intelligence Review Journal

    ACM Class: A.1

  8. arXiv:2205.11269  [pdf, other

    cs.CV

    Dynamic Split Computing for Efficient Deep Edge Intelligence

    Authors: Arian Bakhtiarnia, Nemanja Milošević, Qi Zhang, Dragana Bajović, Alexandros Iosifidis

    Abstract: Deploying deep neural networks (DNNs) on IoT and mobile devices is a challenging task due to their limited computational resources. Thus, demanding tasks are often entirely offloaded to edge servers which can accelerate inference, however, it also causes communication cost and evokes privacy concerns. In addition, this approach leaves the computational capacity of end devices unused. Split computi… ▽ More

    Submitted 17 June, 2022; v1 submitted 23 May, 2022; originally announced May 2022.

    Comments: Accepted by the 2022 International Conference on Machine Learning (ICML 2022) DyNN Workshop

  9. arXiv:2204.02593  [pdf, other

    math.OC cs.IT cs.LG

    Nonlinear gradient map**s and stochastic optimization: A general framework with applications to heavy-tail noise

    Authors: Dusan Jakovetic, Dragana Bajovic, Anit Kumar Sahu, Soummya Kar, Nemanja Milosevic, Dusan Stamenkovic

    Abstract: We introduce a general framework for nonlinear stochastic gradient descent (SGD) for the scenarios when gradient noise exhibits heavy tails. The proposed framework subsumes several popular nonlinearity choices, like clipped, normalized, signed or quantized gradient, but we also consider novel nonlinearity choices. We establish for the considered class of methods strong convergence guarantees assum… ▽ More

    Submitted 6 April, 2022; originally announced April 2022.

    Comments: Submitted for publication Nov 2021

  10. arXiv:2201.01647  [pdf, other

    cs.AI cs.CL cs.IR cs.LG

    Comparison of biomedical relationship extraction methods and models for knowledge graph creation

    Authors: Nikola Milosevic, Wolfgang Thielemann

    Abstract: Biomedical research is growing at such an exponential pace that scientists, researchers, and practitioners are no more able to cope with the amount of published literature in the domain. The knowledge presented in the literature needs to be systematized in such a way that claims and hypotheses can be easily found, accessed, and validated. Knowledge graphs can provide such a framework for semantic… ▽ More

    Submitted 7 August, 2022; v1 submitted 5 January, 2022; originally announced January 2022.

    Comments: Paper submitted to Journal of Semantic Web

    ACM Class: E.2; I.7

    Journal ref: Nikola Milosevic, Wolfgang Thielemann, Comparison of biomedical relationship extraction methods and models for knowledge graph creation, Journal of Web Semantics, 2022, 100756, ISSN 1570-8268,

  11. arXiv:2005.11687  [pdf, other

    cs.CL cs.IR cs.LG

    MASK: A flexible framework to facilitate de-identification of clinical texts

    Authors: Nikola Milosevic, Gangamma Kalappa, Hesam Dadafarin, Mahmoud Azimaee, Goran Nenadic

    Abstract: Medical health records and clinical summaries contain a vast amount of important information in textual form that can help advancing research on treatments, drugs and public health. However, the majority of these information is not shared because they contain private information about patients, their families, or medical staff treating them. Regulations such as HIPPA in the US, PHIPPA in Canada an… ▽ More

    Submitted 9 October, 2020; v1 submitted 24 May, 2020; originally announced May 2020.

  12. arXiv:1910.10660  [pdf

    cs.CR cs.CY cs.LG

    Deep learning guided Android malware and anomaly detection

    Authors: Nikola Milosevic, Junfan Huang

    Abstract: In the past decade, the cyber-crime related to mobile devices has increased. Mobile devices, especially the ones running on Android operating system are particularly interesting to malware creators, as the users often keep the biggest amount of personal information on their mobile devices, such as their contacts, social media profiles, emails, and bank accounts. Both dynamic and static malware ana… ▽ More

    Submitted 23 October, 2019; originally announced October 2019.

    Comments: First (draft) version of the paper

  13. arXiv:1909.10390  [pdf

    cs.CL

    GNTeam at 2018 n2c2: Feature-augmented BiLSTM-CRF for drug-related entity recognition in hospital discharge summaries

    Authors: Maksim Belousov, Nikola Milosevic, Ghada Alfattni, Haifa Alrdahi, Goran Nenadic

    Abstract: Monitoring the administration of drugs and adverse drug reactions are key parts of pharmacovigilance. In this paper, we explore the extraction of drug mentions and drug-related information (reason for taking a drug, route, frequency, dosage, strength, form, duration, and adverse events) from hospital discharge summaries through deep learning that relies on various representations for clinical name… ▽ More

    Submitted 23 September, 2019; originally announced September 2019.

  14. arXiv:1905.11716  [pdf, other

    cs.CL cs.LG

    Extracting adverse drug reactions and their context using sequence labelling ensembles in TAC2017

    Authors: Maksim Belousov, Nikola Milosevic, William Dixon, Goran Nenadic

    Abstract: Adverse drug reactions (ADRs) are unwanted or harmful effects experienced after the administration of a certain drug or a combination of drugs, presenting a challenge for drug development and drug administration. In this paper, we present a set of taggers for extracting adverse drug reactions and related entities, including factors, severity, negations, drug class and animal. The systems used a mi… ▽ More

    Submitted 28 May, 2019; originally announced May 2019.

    Comments: Paper describing submission for TAC ADR shared task

    Journal ref: Text Analytics Conference 2017

  15. arXiv:1905.09086  [pdf, other

    cs.CL cs.IR cs.LG

    From web crawled text to project descriptions: automatic summarizing of social innovation projects

    Authors: Nikola Milosevic, Dimitar Marinov, Abdullah Gok, Goran Nenadic

    Abstract: In the past decade, social innovation projects have gained the attention of policy makers, as they address important social issues in an innovative manner. A database of social innovation is an important source of information that can expand collaboration between social innovators, drive policy and serve as an important resource for research. Such a database needs to have projects described and su… ▽ More

    Submitted 22 May, 2019; originally announced May 2019.

    Comments: Keywords: Summarization, evaluation metrics, text mining, natural language processing, social innovation, SVM, neural networks Accepted for publication in Proceedings of 24th International Conference on Applications of Natural Language to Information Systems (NLDB2019)

    Journal ref: Preceeding of 24th International Conference on Applications of Natural Language to Information Systems (NLDB2019)

  16. arXiv:1902.10031  [pdf

    cs.CL cs.CV cs.LG

    A framework for information extraction from tables in biomedical literature

    Authors: Nikola Milosevic, Cassie Gregson, Robert Hernandez, Goran Nenadic

    Abstract: The scientific literature is growing exponentially, and professionals are no more able to cope with the current amount of publications. Text mining provided in the past methods to retrieve and extract information from text; however, most of these approaches ignored tables and figures. The research done in mining table data still does not have an integrated approach for mining that would consider a… ▽ More

    Submitted 26 February, 2019; originally announced February 2019.

    Comments: 24 pages

    Journal ref: 2019, International Journal on Document Analysis and Recognition (IJDAR)

  17. arXiv:1811.10422  [pdf

    cs.CL cs.AI cs.CY cs.LG

    Creating a contemporary corpus of similes in Serbian by using natural language processing

    Authors: Nikola Milosevic, Goran Nenadic

    Abstract: Simile is a figure of speech that compares two things through the use of connection words, but where comparison is not intended to be taken literally. They are often used in everyday communication, but they are also a part of linguistic cultural heritage. In this paper we present a methodology for semi-automated collection of similes from the World Wide Web using text mining and machine learning t… ▽ More

    Submitted 22 November, 2018; originally announced November 2018.

    Comments: 15 pages, submitted to journal Slovo, however, later withdrawn to correct. Additional work was not done on it, so it is still waiting to be extended. Output of the system can be seen here: http://ezbirka.starisloveni.com/. arXiv admin note: text overlap with arXiv:1605.06319

  18. arXiv:1605.06319  [pdf, other

    cs.CL cs.AI

    As Cool as a Cucumber: Towards a Corpus of Contemporary Similes in Serbian

    Authors: Nikola Milosevic, Goran Nenadic

    Abstract: Similes are natural language expressions used to compare unlikely things, where the comparison is not taken literally. They are often used in everyday communication and are an important part of cultural heritage. Having an up-to-date corpus of similes is challenging, as they are constantly coined and/or adapted to the contemporary times. In this paper we present a methodology for semi-automated co… ▽ More

    Submitted 20 May, 2016; originally announced May 2016.

    Comments: Phrase modelling, simile extraction, language resource building, crowdsourcing

  19. arXiv:1603.00751  [pdf

    cs.LG q-fin.GN

    Equity forecast: Predicting long term stock price movement using machine learning

    Authors: Nikola Milosevic

    Abstract: Long term investment is one of the major investment strategies. However, calculating intrinsic value of some company and evaluating shares for long term investment is not easy, since analyst have to care about a large number of financial indicators and evaluate them in a right manner. So far, little help in predicting the direction of the company value over the longer period of time has been provi… ▽ More

    Submitted 22 November, 2018; v1 submitted 2 March, 2016; originally announced March 2016.

    Comments: 9 pages, 3 tables, computational finance, algorithmic finance

    Journal ref: Journal of Economics Library, 3(2), 2016, 288-294

  20. arXiv:1602.00515  [pdf

    cs.AI cs.CL

    Marvin: Semantic annotation using multiple knowledge sources

    Authors: Nikola Milosevic

    Abstract: People are producing more written material then anytime in the history. The increase is so high that professionals from the various fields are no more able to cope with this amount of publications. Text mining tools can offer tools to help them and one of the tools that can aid information retrieval and information extraction is semantic text annotation. In this report we present Marvin, a text an… ▽ More

    Submitted 2 February, 2016; v1 submitted 1 February, 2016; originally announced February 2016.

    Comments: 9 pages, 4 figures, keywords: Semantic annotation, text normalization, semantic web, linked data, information management, text mining, information extraction, data curation

    ACM Class: D.3.2; K.2; H.2.4

  21. arXiv:1302.5392  [pdf

    cs.CR

    History of malware

    Authors: Nikola Milošević

    Abstract: In past three decades almost everything has changed in the field of malware and malware analysis. From malware created as proof of some security concept and malware created for financial gain to malware created to sabotage infrastructure. In this work we will focus on history and evolution of malware and describe most important malwares.

    Submitted 16 January, 2014; v1 submitted 21 February, 2013; originally announced February 2013.

    Comments: 11 pages, 8 figures describing history and evolution of PC malware from first PC malware to Stuxnet, DoQu and Flame. This article has been withdrawed due some errors in text and publication in the jurnal that asked to withdraw article from other sources

    MSC Class: 68-03

  22. arXiv:1209.4471  [pdf

    cs.CL cs.IR

    Stemmer for Serbian language

    Authors: Nikola Milošević

    Abstract: In linguistic morphology and information retrieval, stemming is the process for reducing inflected (or sometimes derived) words to their stem, base or root form; generally a written word form. In this work is presented suffix strip** stemmer for Serbian language, one of the highly inflectional languages.

    Submitted 20 September, 2012; originally announced September 2012.

    Comments: 16 pages, 8 figures, code included