Skip to main content

Showing 1–12 of 12 results for author: Fergadiotis, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2210.05529  [pdf, other

    cs.CL

    An Exploration of Hierarchical Attention Transformers for Efficient Long Document Classification

    Authors: Ilias Chalkidis, Xiang Dai, Manos Fergadiotis, Prodromos Malakasiotis, Desmond Elliott

    Abstract: Non-hierarchical sparse attention Transformer-based models, such as Longformer and Big Bird, are popular approaches to working with long documents. There are clear benefits to these approaches compared to the original Transformer in terms of efficiency, but Hierarchical Attention Transformer (HAT) models are a vastly understudied alternative. We develop and release fully pre-trained HAT models tha… ▽ More

    Submitted 11 October, 2022; originally announced October 2022.

  2. FiNER: Financial Numeric Entity Recognition for XBRL Tagging

    Authors: Lefteris Loukas, Manos Fergadiotis, Ilias Chalkidis, Eirini Spyropoulou, Prodromos Malakasiotis, Ion Androutsopoulos, Georgios Paliouras

    Abstract: Publicly traded companies are required to submit periodic reports with eXtensive Business Reporting Language (XBRL) word-level tags. Manually tagging the reports is tedious and costly. We, therefore, introduce XBRL tagging as a new entity extraction task for the financial domain and release FiNER-139, a dataset of 1.1M sentences with gold XBRL tags. Unlike typical entity extraction datasets, FiNER… ▽ More

    Submitted 19 April, 2022; v1 submitted 12 March, 2022; originally announced March 2022.

    Comments: 13 pages, long paper at ACL 2022

  3. arXiv:2109.14906  [pdf, other

    cs.CL

    DICoE@FinSim-3: Financial Hypernym Detection using Augmented Terms and Distance-based Features

    Authors: Lefteris Loukas, Konstantinos Bougiatiotis, Manos Fergadiotis, Dimitris Mavroeidis, Elias Zavitsanos

    Abstract: We present the submission of team DICoE for FinSim-3, the 3rd Shared Task on Learning Semantic Similarities for the Financial Domain. The task provides a set of terms in the financial domain and requires to classify them into the most relevant hypernym from a financial ontology. After augmenting the terms with their Investopedia definitions, our system employs a Logistic Regression classifier over… ▽ More

    Submitted 30 September, 2021; originally announced September 2021.

    Comments: 6 pages, Proceedings of the Third Workshop on Financial Technology and Natural Language Processing (FinNLP@IJCAI-2021)

    Report number: https://aclanthology.org/2021.finnlp-1.7

    Journal ref: In Proceedings of the Third Workshop on Financial Technology and Natural Language Processing (FinNLP 2021)

  4. EDGAR-CORPUS: Billions of Tokens Make The World Go Round

    Authors: Lefteris Loukas, Manos Fergadiotis, Ion Androutsopoulos, Prodromos Malakasiotis

    Abstract: We release EDGAR-CORPUS, a novel corpus comprising annual reports from all the publicly traded companies in the US spanning a period of more than 25 years. To the best of our knowledge, EDGAR-CORPUS is the largest financial NLP corpus available to date. All the reports are downloaded, split into their corresponding items (sections), and provided in a clean, easy-to-use JSON format. We use EDGAR-CO… ▽ More

    Submitted 1 October, 2021; v1 submitted 29 September, 2021; originally announced September 2021.

    Comments: 6 pages, short paper at ECONLP 2021 Workshop, in conjunction with EMNLP 2021

  5. arXiv:2109.00904  [pdf, other

    cs.CL

    MultiEURLEX -- A multi-lingual and multi-label legal document classification dataset for zero-shot cross-lingual transfer

    Authors: Ilias Chalkidis, Manos Fergadiotis, Ion Androutsopoulos

    Abstract: We introduce MULTI-EURLEX, a new multilingual dataset for topic classification of legal documents. The dataset comprises 65k European Union (EU) laws, officially translated in 23 languages, annotated with multiple labels from the EUROVOC taxonomy. We highlight the effect of temporal concept drift and the importance of chronological, instead of random splits. We use the dataset as a testbed for zer… ▽ More

    Submitted 6 September, 2021; v1 submitted 2 September, 2021; originally announced September 2021.

    Comments: 9 pages, long paper at EMNLP 2021 proceedings

  6. arXiv:2103.13084  [pdf, other

    cs.CL

    Paragraph-level Rationale Extraction through Regularization: A case study on European Court of Human Rights Cases

    Authors: Ilias Chalkidis, Manos Fergadiotis, Dimitrios Tsarapatsanis, Nikolaos Aletras, Ion Androutsopoulos, Prodromos Malakasiotis

    Abstract: Interpretability or explainability is an emerging research field in NLP. From a user-centric point of view, the goal is to build models that provide proper justification for their decisions, similar to those of humans, by requiring the models to satisfy additional constraints. To this end, we introduce a new application on legal text where, contrary to mainstream literature targeting word-level ra… ▽ More

    Submitted 24 March, 2021; originally announced March 2021.

    Comments: 9 pages, long paper at NAACL 2021 proceedings

  7. arXiv:2101.10726  [pdf, other

    cs.CL cs.IR

    Regulatory Compliance through Doc2Doc Information Retrieval: A case study in EU/UK legislation where text similarity has limitations

    Authors: Ilias Chalkidis, Manos Fergadiotis, Nikolaos Manginas, Eva Katakalou, Prodromos Malakasiotis

    Abstract: Major scandals in corporate history have urged the need for regulatory compliance, where organizations need to ensure that their controls (processes) comply with relevant laws, regulations, and policies. However, kee** track of the constantly changing legislation is difficult, thus organizations are increasingly adopting Regulatory Technology (RegTech) to facilitate the process. To this end, we… ▽ More

    Submitted 26 January, 2021; originally announced January 2021.

    Comments: Accepted for publication by EACL 2021, 13 pages including references and appendices

  8. arXiv:2101.04355  [pdf, other

    cs.CL

    Neural Contract Element Extraction Revisited: Letters from Sesame Street

    Authors: Ilias Chalkidis, Manos Fergadiotis, Prodromos Malakasiotis, Ion Androutsopoulos

    Abstract: We investigate contract element extraction. We show that LSTM-based encoders perform better than dilated CNNs, Transformers, and BERT in this task. We also find that domain-specific WORD2VEC embeddings outperform generic pre-trained GLOVE embeddings. Morpho-syntactic features in the form of POS tag and token shape embeddings, as well as context-aware ELMO embeddings do not improve performance. Sev… ▽ More

    Submitted 22 February, 2021; v1 submitted 12 January, 2021; originally announced January 2021.

    Comments: 6 pages

    Journal ref: updated version of the paper presented at Document Intelligence Workshop (NeurIPS 2019 Workshop)

  9. arXiv:2010.02559  [pdf, other

    cs.CL

    LEGAL-BERT: The Muppets straight out of Law School

    Authors: Ilias Chalkidis, Manos Fergadiotis, Prodromos Malakasiotis, Nikolaos Aletras, Ion Androutsopoulos

    Abstract: BERT has achieved impressive performance in several NLP tasks. However, there has been limited investigation on its adaptation guidelines in specialised domains. Here we focus on the legal domain, where we explore several approaches for applying BERT models to downstream legal tasks, evaluating on multiple datasets. Our findings indicate that the previous guidelines for pre-training and fine-tunin… ▽ More

    Submitted 6 October, 2020; originally announced October 2020.

    Comments: 5 pages, short paper in Findings of EMNLP 2020

  10. arXiv:2010.01653  [pdf, other

    cs.CL

    An Empirical Study on Large-Scale Multi-Label Text Classification Including Few and Zero-Shot Labels

    Authors: Ilias Chalkidis, Manos Fergadiotis, Sotiris Kotitsas, Prodromos Malakasiotis, Nikolaos Aletras, Ion Androutsopoulos

    Abstract: Large-scale Multi-label Text Classification (LMTC) has a wide range of Natural Language Processing (NLP) applications and presents interesting challenges. First, not all labels are well represented in the training set, due to the very large label set and the skewed label distributions of LMTC datasets. Also, label hierarchies and differences in human labelling guidelines may affect graph-aware ann… ▽ More

    Submitted 4 October, 2020; originally announced October 2020.

    Comments: 9 pages, long paper at EMNLP 2020 proceedings

  11. arXiv:1906.02192  [pdf, other

    cs.CL

    Large-Scale Multi-Label Text Classification on EU Legislation

    Authors: Ilias Chalkidis, Manos Fergadiotis, Prodromos Malakasiotis, Ion Androutsopoulos

    Abstract: We consider Large-Scale Multi-Label Text Classification (LMTC) in the legal domain. We release a new dataset of 57k legislative documents from EURLEX, annotated with ~4.3k EUROVOC labels, which is suitable for LMTC, few- and zero-shot learning. Experimenting with several neural classifiers, we show that BIGRUs with label-wise attention perform better than other current state of the art methods. Do… ▽ More

    Submitted 5 June, 2019; originally announced June 2019.

    Comments: 9 pages, short paper at ACL 2019. arXiv admin note: text overlap with arXiv:1905.10892

  12. arXiv:1905.10892  [pdf, other

    cs.CL

    Extreme Multi-Label Legal Text Classification: A case study in EU Legislation

    Authors: Ilias Chalkidis, Manos Fergadiotis, Prodromos Malakasiotis, Nikolaos Aletras, Ion Androutsopoulos

    Abstract: We consider the task of Extreme Multi-Label Text Classification (XMTC) in the legal domain. We release a new dataset of 57k legislative documents from EURLEX, the European Union's public document database, annotated with concepts from EUROVOC, a multidisciplinary thesaurus. The dataset is substantially larger than previous EURLEX datasets and suitable for XMTC, few-shot and zero-shot learning. Exp… ▽ More

    Submitted 26 May, 2019; originally announced May 2019.

    Comments: 10 pages, long paper at NLLP Workshop of NAACL-HLT 2019