Skip to main content

Showing 1–35 of 35 results for author: Nenadic, G

.
  1. arXiv:2406.03151  [pdf, other

    cs.CL cs.LG

    Which Side Are You On? A Multi-task Dataset for End-to-End Argument Summarisation and Evaluation

    Authors: Hao Li, Yu** Wu, Viktor Schlegel, Riza Batista-Navarro, Tharindu Madusanka, Iqra Zahid, Jiayan Zeng, Xiaochi Wang, Xinran He, Yizhi Li, Goran Nenadic

    Abstract: With the recent advances of large language models (LLMs), it is no longer infeasible to build an automated debate system that helps people to synthesise persuasive arguments. Previous work attempted this task by integrating multiple components. In our work, we introduce an argument mining dataset that captures the end-to-end process of preparing an argumentative essay for a debate, which covers th… ▽ More

    Submitted 6 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

    Comments: Published on ACL 2024 Findings

  2. arXiv:2405.16969  [pdf, other

    cs.CL stat.AP

    The Multi-Range Theory of Translation Quality Measurement: MQM scoring models and Statistical Quality Control

    Authors: Arle Lommel, Serge Gladkoff, Alan Melby, Sue Ellen Wright, Ingemar Strandvik, Katerina Gasova, Angelika Vaasa, Andy Benzo, Romina Marazzato Sparano, Monica Foresi, Johani Innis, Lifeng Han, Goran Nenadic

    Abstract: The year 2024 marks the 10th anniversary of the Multidimensional Quality Metrics (MQM) framework for analytic translation quality evaluation. The MQM error typology has been widely used by practitioners in the translation and localization industry and has served as the basis for many derivative projects. The annual Conference on Machine Translation (WMT) shared tasks on both human and automatic tr… ▽ More

    Submitted 9 June, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

    Comments: working paper, 20 pages, under-review

  3. arXiv:2405.12630  [pdf, other

    cs.CL cs.AI

    Exploration of Masked and Causal Language Modelling for Text Generation

    Authors: Nicolo Micheletti, Samuel Belkadi, Lifeng Han, Goran Nenadic

    Abstract: Large Language Models (LLMs) have revolutionised the field of Natural Language Processing (NLP) and have achieved state-of-the-art performance in practically every task in this field. However, the prevalent approach used in text generation, Causal Language Modelling (CLM), which generates text sequentially from left to right, inherently limits the freedom of the model, which does not decide when a… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: working paper

  4. arXiv:2405.08172  [pdf, other

    cs.CL cs.AI

    CANTONMT: Investigating Back-Translation and Model-Switch Mechanisms for Cantonese-English Neural Machine Translation

    Authors: Kung Yin Hong, Lifeng Han, Riza Batista-Navarro, Goran Nenadic

    Abstract: This paper investigates the development and evaluation of machine translation models from Cantonese to English, where we propose a novel approach to tackle low-resource language translations. The main objectives of the study are to develop a model that can effectively translate Cantonese to English and evaluate it against state-of-the-art commercial models. To achieve this, a new parallel corpus h… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: on-going work, 30 pages

  5. arXiv:2403.11346  [pdf, other

    cs.CL cs.AI

    CantonMT: Cantonese to English NMT Platform with Fine-Tuned Models Using Synthetic Back-Translation Data

    Authors: Kung Yin Hong, Lifeng Han, Riza Batista-Navarro, Goran Nenadic

    Abstract: Neural Machine Translation (NMT) for low-resource languages is still a challenging task in front of NLP researchers. In this work, we deploy a standard data augmentation methodology by back-translation to a new language translation direction Cantonese-to-English. We present the models we fine-tuned using the limited amount of real data and the synthetic data we generated using back-translation inc… ▽ More

    Submitted 9 June, 2024; v1 submitted 17 March, 2024; originally announced March 2024.

    Comments: Accepted by: The 25th Annual Conference of The European Association for Machine Translation, 24 - 27 June 2024, Sheffield, UK (forthcoming)

  6. arXiv:2312.07250  [pdf, other

    cs.CL cs.AI

    Neural Machine Translation of Clinical Text: An Empirical Investigation into Multilingual Pre-Trained Language Models and Transfer-Learning

    Authors: Lifeng Han, Serge Gladkoff, Gleb Erofeev, Irina Sorokina, Betty Galiano, Goran Nenadic

    Abstract: We conduct investigations on clinical text machine translation by examining multilingual neural network models using deep learning such as Transformer based structures. Furthermore, to address the language resource imbalance issue, we also carry out experiments using a transfer learning methodology based on massive multilingual pre-trained language models (MMPLMs). The experimental results on thre… ▽ More

    Submitted 21 February, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

    Comments: Accepted by Frontiers in Digital Health - Health Informatics

  7. arXiv:2312.03736  [pdf

    cs.CL cs.AI cs.CR cs.DL cs.LG

    De-identification of clinical free text using natural language processing: A systematic review of current approaches

    Authors: Aleksandar Kovačević, Bojana Bašaragin, Nikola Milošević, Goran Nenadić

    Abstract: Background: Electronic health records (EHRs) are a valuable resource for data-driven medical research. However, the presence of protected health information (PHI) makes EHRs unsuitable to be shared for research purposes. De-identification, i.e. the process of removing PHI is a critical step in making EHR data accessible. Natural language processing has repeatedly demonstrated its feasibility in au… ▽ More

    Submitted 28 November, 2023; originally announced December 2023.

    Comments: Submitted to Artificial Intelligence in Medicine

    Journal ref: Artificial Intelligence in Medicine, Volume 151, May 2024

  8. arXiv:2311.10856  [pdf

    cs.AI

    Exploring the Consistency, Quality and Challenges in Manual and Automated Coding of Free-text Diagnoses from Hospital Outpatient Letters

    Authors: Warren Del-Pinto, George Demetriou, Meghna Jani, Rikesh Patel, Leanne Gray, Alex Bulcock, Niels Peek, Andrew S. Kanter, William G Dixon, Goran Nenadic

    Abstract: Coding of unstructured clinical free-text to produce interoperable structured data is essential to improve direct care, support clinical communication and to enable clinical research.However, manual clinical coding is difficult and time consuming, which motivates the development and use of natural language processing for automated coding. This work evaluates the quality and consistency of both man… ▽ More

    Submitted 17 November, 2023; originally announced November 2023.

  9. arXiv:2310.19727  [pdf, other

    cs.CL cs.AI cs.LG

    Generating Medical Prescriptions with Conditional Transformer

    Authors: Samuel Belkadi, Nicolo Micheletti, Lifeng Han, Warren Del-Pinto, Goran Nenadic

    Abstract: Access to real-world medication prescriptions is essential for medical research and healthcare quality improvement. However, access to real medication prescriptions is often limited due to the sensitive nature of the information expressed. Additionally, manually labelling these instructions for training and fine-tuning Natural Language Processing (NLP) models can be tedious and expensive. We intro… ▽ More

    Submitted 18 November, 2023; v1 submitted 30 October, 2023; originally announced October 2023.

    Comments: Accepted to: Workshop on Synthetic Data Generation with Generative AI (SyntheticData4ML Workshop) at NeurIPS 2023

  10. arXiv:2310.02229  [pdf, other

    cs.CL cs.AI

    Extraction of Medication and Temporal Relation from Clinical Text using Neural Language Models

    Authors: Hangyu Tu, Lifeng Han, Goran Nenadic

    Abstract: Clinical texts, represented in electronic medical records (EMRs), contain rich medical information and are essential for disease prediction, personalised information recommendation, clinical decision support, and medication pattern mining and measurement. Relation extractions between medication mentions and temporal information can further help clinicians better understand the patients' treatment… ▽ More

    Submitted 8 October, 2023; v1 submitted 3 October, 2023; originally announced October 2023.

    Comments: working paper

  11. arXiv:2309.13202  [pdf, other

    cs.CL cs.AI

    Investigating Large Language Models and Control Mechanisms to Improve Text Readability of Biomedical Abstracts

    Authors: Zihao Li, Samuel Belkadi, Nicolo Micheletti, Lifeng Han, Matthew Shardlow, Goran Nenadic

    Abstract: Biomedical literature often uses complex language and inaccessible professional terminologies. That is why simplification plays an important role in improving public health literacy. Applying Natural Language Processing (NLP) models to automate such tasks allows for quick and direct accessibility for lay readers. In this work, we investigate the ability of state-of-the-art large language models (L… ▽ More

    Submitted 16 March, 2024; v1 submitted 22 September, 2023; originally announced September 2023.

    Comments: Accepted by IEEE-ICHI 2024 https://ieeeichi2024.github.io/

  12. arXiv:2308.06546  [pdf, other

    cs.CL cs.AI

    MC-DRE: Multi-Aspect Cross Integration for Drug Event/Entity Extraction

    Authors: Jie Yang, Soyeon Caren Han, Siqu Long, Josiah Poon, Goran Nenadic

    Abstract: Extracting meaningful drug-related information chunks, such as adverse drug events (ADE), is crucial for preventing morbidity and saving many lives. Most ADEs are reported via an unstructured conversation with the medical context, so applying a general entity recognition approach is not sufficient enough. In this paper, we propose a new multi-aspect cross-integration framework for drug entity/even… ▽ More

    Submitted 15 August, 2023; v1 submitted 12 August, 2023; originally announced August 2023.

    Comments: Accepted at CIKM 2023

  13. arXiv:2308.03629  [pdf, other

    cs.CL cs.AI cs.LG

    MedMine: Examining Pre-trained Language Models on Medication Mining

    Authors: Haifa Alrdahi, Lifeng Han, Hendrik Šuvalov, Goran Nenadic

    Abstract: Automatic medication mining from clinical and biomedical text has become a popular topic due to its real impact on healthcare applications and the recent development of powerful language models (LMs). However, fully-automatic extraction models still face obstacles to be overcome such that they can be deployed directly into clinical practice for better impacts. Such obstacles include their imbalanc… ▽ More

    Submitted 8 August, 2023; v1 submitted 7 August, 2023; originally announced August 2023.

    Comments: Open Research Project. 7 pages, 1 figure, 5 tables

  14. arXiv:2308.00158  [pdf, other

    cs.CL cs.AI

    MTUncertainty: Assessing the Need for Post-editing of Machine Translation Outputs by Fine-tuning OpenAI LLMs

    Authors: Serge Gladkoff, Lifeng Han, Gleb Erofeev, Irina Sorokina, Goran Nenadic

    Abstract: Translation Quality Evaluation (TQE) is an essential step of the modern translation production process. TQE is critical in assessing both machine translation (MT) and human translation (HT) quality without reference translations. The ability to evaluate or even simply estimate the quality of translation automatically may open significant efficiency gains through process optimisation. This work exa… ▽ More

    Submitted 21 June, 2024; v1 submitted 31 July, 2023; originally announced August 2023.

    Comments: Accepted by EAMT2024: The 25th Annual Conference of The European Association for Machine Translation

  15. arXiv:2307.02006  [pdf, other

    cs.CL

    PULSAR at MEDIQA-Sum 2023: Large Language Models Augmented by Synthetic Dialogue Convert Patient Dialogues to Medical Records

    Authors: Viktor Schlegel, Hao Li, Yu** Wu, Anand Subramanian, Thanh-Tung Nguyen, Abhinav Ramesh Kashyap, Daniel Beck, Xiaojun Zeng, Riza Theresa Batista-Navarro, Stefan Winkler, Goran Nenadic

    Abstract: This paper describes PULSAR, our system submission at the ImageClef 2023 MediQA-Sum task on summarising patient-doctor dialogues into clinical records. The proposed framework relies on domain-specific pre-training, to produce a specialised language model which is trained on task-specific natural data augmented by synthetic data generated by a black-box LLM. We find limited evidence towards the eff… ▽ More

    Submitted 4 July, 2023; originally announced July 2023.

    Comments: 8 pages. ImageClef 2023 MediQA-Sum

  16. arXiv:2306.02754  [pdf, other

    cs.CL

    PULSAR: Pre-training with Extracted Healthcare Terms for Summarising Patients' Problems and Data Augmentation with Black-box Large Language Models

    Authors: Hao Li, Yu** Wu, Viktor Schlegel, Riza Batista-Navarro, Thanh-Tung Nguyen, Abhinav Ramesh Kashyap, Xiaojun Zeng, Daniel Beck, Stefan Winkler, Goran Nenadic

    Abstract: Medical progress notes play a crucial role in documenting a patient's hospital journey, including his or her condition, treatment plan, and any updates for healthcare providers. Automatic summarisation of a patient's problems in the form of a problem list can aid stakeholders in understanding a patient's condition, reducing workload and cognitive bias. BioNLP 2023 Shared Task 1A focuses on generat… ▽ More

    Submitted 5 June, 2023; originally announced June 2023.

    Comments: Accepted by ACL 2023's workshop BioNLP 2023

  17. arXiv:2305.16000  [pdf, other

    cs.CL cs.AI

    Do You Hear The People Sing? Key Point Analysis via Iterative Clustering and Abstractive Summarisation

    Authors: Hao Li, Viktor Schlegel, Riza Batista-Navarro, Goran Nenadic

    Abstract: Argument summarisation is a promising but currently under-explored field. Recent work has aimed to provide textual summaries in the form of concise and salient short texts, i.e., key points (KPs), in a task known as Key Point Analysis (KPA). One of the main challenges in KPA is finding high-quality key point candidates from dozens of arguments even in a small corpus. Furthermore, evaluating key po… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.

    Comments: Accepted by ACL 2023 Main Conference

  18. arXiv:2303.04526  [pdf, other

    cs.CL cs.IT math.NA stat.AP

    Student's t-Distribution: On Measuring the Inter-Rater Reliability When the Observations are Scarce

    Authors: Serge Gladkoff, Lifeng Han, Goran Nenadic

    Abstract: In natural language processing (NLP) we always rely on human judgement as the golden quality evaluation method. However, there has been an ongoing debate on how to better evaluate inter-rater reliability (IRR) levels for certain evaluation tasks, such as translation quality evaluation (TQE), especially when the data samples (observations) are very scarce. In this work, we first introduce the study… ▽ More

    Submitted 9 July, 2023; v1 submitted 8 March, 2023; originally announced March 2023.

    Comments: Accepted to RANLP2023: Recent Advances in Natural Language Processing, Varna, Bulgaria. 30 Aug - 8 Sep \url{https://ranlp.org/ranlp2023/}

  19. arXiv:2301.03029  [pdf, other

    cs.CL cs.SI

    Topic Modelling of Swedish Newspaper Articles about Coronavirus: a Case Study using Latent Dirichlet Allocation Method

    Authors: Bernadeta Griciūtė, Lifeng Han, Goran Nenadic

    Abstract: Topic Modelling (TM) is from the research branches of natural language understanding (NLU) and natural language processing (NLP) that is to facilitate insightful analysis from large documents and datasets, such as a summarisation of main topics and the topic changes. This kind of discovery is getting more popular in real-life applications due to its impact on big data analytics. In this study, fro… ▽ More

    Submitted 18 April, 2023; v1 submitted 8 January, 2023; originally announced January 2023.

    Comments: Accepted to International HealthNLP WS @ IEEE-ICHI2023 https://ieeeichi.github.io/ICHI2023/

  20. arXiv:2210.12770  [pdf, other

    cs.CL cs.AI cs.LG

    Exploring the Value of Pre-trained Language Models for Clinical Named Entity Recognition

    Authors: Samuel Belkadi, Lifeng Han, Yu** Wu, Goran Nenadic

    Abstract: The practice of fine-tuning Pre-trained Language Models (PLMs) from general or domain-specific data to a specific task with limited resources, has gained popularity within the field of natural language processing (NLP). In this work, we re-visit this assumption and carry out an investigation in clinical NLP, specifically Named Entity Recognition on drugs and their related attributes. We compare Tr… ▽ More

    Submitted 30 October, 2023; v1 submitted 23 October, 2022; originally announced October 2022.

    Comments: working paper - Large Language Models, Fine-tuning LLMs, Clinical NLP, Medication Mining, AI for Healthcare

  21. arXiv:2210.06068  [pdf, other

    cs.CL cs.AI

    Investigating Massive Multilingual Pre-Trained Machine Translation Models for Clinical Domain via Transfer Learning

    Authors: Lifeng Han, Gleb Erofeev, Irina Sorokina, Serge Gladkoff, Goran Nenadic

    Abstract: Massively multilingual pre-trained language models (MMPLMs) are developed in recent years demonstrating superpowers and the pre-knowledge they acquire for downstream tasks. This work investigates whether MMPLMs can be applied to clinical domain machine translation (MT) towards entirely unseen languages via transfer learning. We carry out an experimental investigation using Meta-AI's MMPLMs ``wmt21… ▽ More

    Submitted 4 June, 2023; v1 submitted 12 October, 2022; originally announced October 2022.

    Comments: Accepted to ClinicalNLP-2023 WS@ACL-2023

  22. arXiv:2210.04029  [pdf, other

    cs.CL

    EDU-level Extractive Summarization with Varying Summary Lengths

    Authors: Yu** Wu, Ching-Hsun Tseng, Jiayu Shang, Shengzhong Mao, Goran Nenadic, Xiao-Jun Zeng

    Abstract: Extractive models usually formulate text summarization as extracting fixed top-$k$ salient sentences from the document as a summary. Few works exploited extracting finer-grained Elementary Discourse Unit (EDU) with little analysis and justification for the extractive unit selection. Further, the selection strategy of the fixed top-$k$ salient sentences fits the summarization need poorly, as the nu… ▽ More

    Submitted 13 March, 2023; v1 submitted 8 October, 2022; originally announced October 2022.

    Comments: Accepted to EACL 2023 Findings

  23. arXiv:2209.07417  [pdf, other

    cs.CL cs.AI

    Examining Large Pre-Trained Language Models for Machine Translation: What You Don't Know About It

    Authors: Lifeng Han, Gleb Erofeev, Irina Sorokina, Serge Gladkoff, Goran Nenadic

    Abstract: Pre-trained language models (PLMs) often take advantage of the monolingual and multilingual dataset that is freely available online to acquire general or mixed domain knowledge before deployment into specific tasks. Extra-large PLMs (xLPLMs) are proposed very recently to claim supreme performances over smaller-sized PLMs such as in machine translation (MT) tasks. These xLPLMs include Meta-AI's wmt… ▽ More

    Submitted 25 October, 2022; v1 submitted 15 September, 2022; originally announced September 2022.

    Comments: System paper Accepted to WMT2022: BiomedicalMT Track (ClinSpEn2022)

  24. arXiv:2012.04056  [pdf, ps, other

    cs.CL cs.AI

    Semantics Altering Modifications for Evaluating Comprehension in Machine Reading

    Authors: Viktor Schlegel, Goran Nenadic, Riza Batista-Navarro

    Abstract: Advances in NLP have yielded impressive results for the task of machine reading comprehension (MRC), with approaches having been reported to achieve performance comparable to that of humans. In this paper, we investigate whether state-of-the-art MRC models are able to correctly process Semantics Altering Modifications (SAM): linguistically-motivated phenomena that alter the semantics of a sentence… ▽ More

    Submitted 15 June, 2021; v1 submitted 7 December, 2020; originally announced December 2020.

    Comments: AAAI 2021, final version. 7 pages content + 2 pages references

  25. arXiv:2010.08433  [pdf, other

    cs.CL cs.IR

    An efficient representation of chronological events in medical texts

    Authors: Andrey Kormilitzin, Nemanja Vaci, Qiang Liu, Hao Ni, Goran Nenadic, Alejo Nevado-Holgado

    Abstract: In this work we addressed the problem of capturing sequential information contained in longitudinal electronic health records (EHRs). Clinical notes, which is a particular type of EHR data, are a rich source of information and practitioners often develop clever solutions how to maximise the sequential information contained in free-texts. We proposed a systematic methodology for learning from chron… ▽ More

    Submitted 24 October, 2020; v1 submitted 16 October, 2020; originally announced October 2020.

    Comments: 4 pages, 2 figures, 7 tables

  26. arXiv:2005.14709  [pdf, other

    cs.CL

    Beyond Leaderboards: A survey of methods for revealing weaknesses in Natural Language Inference data and models

    Authors: Viktor Schlegel, Goran Nenadic, Riza Batista-Navarro

    Abstract: Recent years have seen a growing number of publications that analyse Natural Language Inference (NLI) datasets for superficial cues, whether they undermine the complexity of the tasks underlying those datasets and how they impact those models that are optimised and evaluated on this data. This structured survey provides an overview of the evolving research area by categorising reported weaknesses… ▽ More

    Submitted 29 May, 2020; originally announced May 2020.

    Comments: 10 Pages

  27. arXiv:2005.11687  [pdf, other

    cs.CL cs.IR cs.LG

    MASK: A flexible framework to facilitate de-identification of clinical texts

    Authors: Nikola Milosevic, Gangamma Kalappa, Hesam Dadafarin, Mahmoud Azimaee, Goran Nenadic

    Abstract: Medical health records and clinical summaries contain a vast amount of important information in textual form that can help advancing research on treatments, drugs and public health. However, the majority of these information is not shared because they contain private information about patients, their families, or medical staff treating them. Regulations such as HIPPA in the US, PHIPPA in Canada an… ▽ More

    Submitted 9 October, 2020; v1 submitted 24 May, 2020; originally announced May 2020.

  28. arXiv:2003.04642  [pdf, ps, other

    cs.CL

    A Framework for Evaluation of Machine Reading Comprehension Gold Standards

    Authors: Viktor Schlegel, Marco Valentino, André Freitas, Goran Nenadic, Riza Batista-Navarro

    Abstract: Machine Reading Comprehension (MRC) is the task of answering a question over a paragraph of text. While neural MRC systems gain popularity and achieve noticeable performance, issues are being raised with the methodology used to establish their performance, particularly concerning the data design of gold standards that are used to evaluate them. There is but a limited understanding of the challenge… ▽ More

    Submitted 10 March, 2020; originally announced March 2020.

    Comments: In Proceedings of the 12th International Conference on Language Resources and Evaluation (LREC 2020)

  29. arXiv:1909.10390  [pdf

    cs.CL

    GNTeam at 2018 n2c2: Feature-augmented BiLSTM-CRF for drug-related entity recognition in hospital discharge summaries

    Authors: Maksim Belousov, Nikola Milosevic, Ghada Alfattni, Haifa Alrdahi, Goran Nenadic

    Abstract: Monitoring the administration of drugs and adverse drug reactions are key parts of pharmacovigilance. In this paper, we explore the extraction of drug mentions and drug-related information (reason for taking a drug, route, frequency, dosage, strength, form, duration, and adverse events) from hospital discharge summaries through deep learning that relies on various representations for clinical name… ▽ More

    Submitted 23 September, 2019; originally announced September 2019.

  30. arXiv:1905.11716  [pdf, other

    cs.CL cs.LG

    Extracting adverse drug reactions and their context using sequence labelling ensembles in TAC2017

    Authors: Maksim Belousov, Nikola Milosevic, William Dixon, Goran Nenadic

    Abstract: Adverse drug reactions (ADRs) are unwanted or harmful effects experienced after the administration of a certain drug or a combination of drugs, presenting a challenge for drug development and drug administration. In this paper, we present a set of taggers for extracting adverse drug reactions and related entities, including factors, severity, negations, drug class and animal. The systems used a mi… ▽ More

    Submitted 28 May, 2019; originally announced May 2019.

    Comments: Paper describing submission for TAC ADR shared task

    Journal ref: Text Analytics Conference 2017

  31. arXiv:1905.09086  [pdf, other

    cs.CL cs.IR cs.LG

    From web crawled text to project descriptions: automatic summarizing of social innovation projects

    Authors: Nikola Milosevic, Dimitar Marinov, Abdullah Gok, Goran Nenadic

    Abstract: In the past decade, social innovation projects have gained the attention of policy makers, as they address important social issues in an innovative manner. A database of social innovation is an important source of information that can expand collaboration between social innovators, drive policy and serve as an important resource for research. Such a database needs to have projects described and su… ▽ More

    Submitted 22 May, 2019; originally announced May 2019.

    Comments: Keywords: Summarization, evaluation metrics, text mining, natural language processing, social innovation, SVM, neural networks Accepted for publication in Proceedings of 24th International Conference on Applications of Natural Language to Information Systems (NLDB2019)

    Journal ref: Preceeding of 24th International Conference on Applications of Natural Language to Information Systems (NLDB2019)

  32. arXiv:1902.10031  [pdf

    cs.CL cs.CV cs.LG

    A framework for information extraction from tables in biomedical literature

    Authors: Nikola Milosevic, Cassie Gregson, Robert Hernandez, Goran Nenadic

    Abstract: The scientific literature is growing exponentially, and professionals are no more able to cope with the current amount of publications. Text mining provided in the past methods to retrieve and extract information from text; however, most of these approaches ignored tables and figures. The research done in mining table data still does not have an integrated approach for mining that would consider a… ▽ More

    Submitted 26 February, 2019; originally announced February 2019.

    Comments: 24 pages

    Journal ref: 2019, International Journal on Document Analysis and Recognition (IJDAR)

  33. arXiv:1811.10422  [pdf

    cs.CL cs.AI cs.CY cs.LG

    Creating a contemporary corpus of similes in Serbian by using natural language processing

    Authors: Nikola Milosevic, Goran Nenadic

    Abstract: Simile is a figure of speech that compares two things through the use of connection words, but where comparison is not intended to be taken literally. They are often used in everyday communication, but they are also a part of linguistic cultural heritage. In this paper we present a methodology for semi-automated collection of similes from the World Wide Web using text mining and machine learning t… ▽ More

    Submitted 22 November, 2018; originally announced November 2018.

    Comments: 15 pages, submitted to journal Slovo, however, later withdrawn to correct. Additional work was not done on it, so it is still waiting to be extended. Output of the system can be seen here: http://ezbirka.starisloveni.com/. arXiv admin note: text overlap with arXiv:1605.06319

  34. arXiv:1605.06319  [pdf, other

    cs.CL cs.AI

    As Cool as a Cucumber: Towards a Corpus of Contemporary Similes in Serbian

    Authors: Nikola Milosevic, Goran Nenadic

    Abstract: Similes are natural language expressions used to compare unlikely things, where the comparison is not taken literally. They are often used in everyday communication and are an important part of cultural heritage. Having an up-to-date corpus of similes is challenging, as they are constantly coined and/or adapted to the contemporary times. In this paper we present a methodology for semi-automated co… ▽ More

    Submitted 20 May, 2016; originally announced May 2016.

    Comments: Phrase modelling, simile extraction, language resource building, crowdsourcing

  35. arXiv:1304.7942  [pdf, other

    cs.CL

    ManTIME: Temporal expression identification and normalization in the TempEval-3 challenge

    Authors: Michele Filannino, Gavin Brown, Goran Nenadic

    Abstract: This paper describes a temporal expression identification and normalization system, ManTIME, developed for the TempEval-3 challenge. The identification phase combines the use of conditional random fields along with a post-processing identification pipeline, whereas the normalization phase is carried out using NorMA, an open-source rule-based temporal normalizer. We investigate the performance vari… ▽ More

    Submitted 30 April, 2013; originally announced April 2013.

    Comments: 5 pages, 1 figure, 2 tables Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Seventh International Workshop on Semantic Evaluation (SemEval 2013)

    ACM Class: I.2.7; I.2.4; I.2.6