Skip to main content

Showing 51–64 of 64 results for author: Aletras, N

.
  1. arXiv:2010.02559  [pdf, other

    cs.CL

    LEGAL-BERT: The Muppets straight out of Law School

    Authors: Ilias Chalkidis, Manos Fergadiotis, Prodromos Malakasiotis, Nikolaos Aletras, Ion Androutsopoulos

    Abstract: BERT has achieved impressive performance in several NLP tasks. However, there has been limited investigation on its adaptation guidelines in specialised domains. Here we focus on the legal domain, where we explore several approaches for applying BERT models to downstream legal tasks, evaluating on multiple datasets. Our findings indicate that the previous guidelines for pre-training and fine-tunin… ▽ More

    Submitted 6 October, 2020; originally announced October 2020.

    Comments: 5 pages, short paper in Findings of EMNLP 2020

  2. arXiv:2010.01653  [pdf, other

    cs.CL

    An Empirical Study on Large-Scale Multi-Label Text Classification Including Few and Zero-Shot Labels

    Authors: Ilias Chalkidis, Manos Fergadiotis, Sotiris Kotitsas, Prodromos Malakasiotis, Nikolaos Aletras, Ion Androutsopoulos

    Abstract: Large-scale Multi-label Text Classification (LMTC) has a wide range of Natural Language Processing (NLP) applications and presents interesting challenges. First, not all labels are well represented in the training set, due to the very large label set and the skewed label distributions of LMTC datasets. Also, label hierarchies and differences in human labelling guidelines may affect graph-aware ann… ▽ More

    Submitted 4 October, 2020; originally announced October 2020.

    Comments: 9 pages, long paper at EMNLP 2020 proceedings

  3. arXiv:2009.14734  [pdf, other

    cs.CL cs.SI

    Point-of-Interest Type Inference from Social Media Text

    Authors: Danae Sánchez Villegas, Daniel Preoţiuc-Pietro, Nikolaos Aletras

    Abstract: Physical places help shape how we perceive the experiences we have there. For the first time, we study the relationship between social media text and the type of the place from where it was posted, whether a park, restaurant, or someplace else. To facilitate this, we introduce a novel data set of $\sim$200,000 English tweets published from 2,761 different points-of-interest in the U.S., enriched w… ▽ More

    Submitted 2 October, 2020; v1 submitted 30 September, 2020; originally announced September 2020.

    Comments: Accepted at AACL-IJCNLP 2020

  4. Automatic Generation of Topic Labels

    Authors: Areej Alokaili, Nikolaos Aletras, Mark Stevenson

    Abstract: Topic modelling is a popular unsupervised method for identifying the underlying themes in document collections that has many applications in information retrieval. A topic is usually represented by a list of terms ranked by their probability but, since these can be difficult to interpret, various approaches have been developed to assign descriptive labels to topics. Previous work on the automatic… ▽ More

    Submitted 29 May, 2020; originally announced June 2020.

    Comments: Short paper accepted at SIGIR '20

  5. arXiv:2005.10608  [pdf, other

    cs.CL

    Unsupervised Quality Estimation for Neural Machine Translation

    Authors: Marina Fomicheva, Shuo Sun, Lisa Yankovskaya, Frédéric Blain, Francisco Guzmán, Mark Fishel, Nikolaos Aletras, Vishrav Chaudhary, Lucia Specia

    Abstract: Quality Estimation (QE) is an important component in making Machine Translation (MT) useful in real-world applications, as it is aimed to inform the user on the quality of the MT output at test time. Existing approaches require large amounts of expert annotated data, computation and time for training. As an alternative, we devise an unsupervised approach to QE where no training or access to additi… ▽ More

    Submitted 20 July, 2020; v1 submitted 21 May, 2020; originally announced May 2020.

    Comments: Accepted for publication in TACL. Authors' final version

  6. arXiv:2004.13878  [pdf, other

    cs.CL

    Analyzing Political Parody in Social Media

    Authors: Antonis Maronikolakis, Danae Sanchez Villegas, Daniel Preotiuc-Pietro, Nikolaos Aletras

    Abstract: Parody is a figurative device used to imitate an entity for comedic or critical purposes and represents a widespread phenomenon in social media through many popular parody accounts. In this paper, we present the first computational study of parody. We introduce a new publicly available data set of tweets from real politicians and their corresponding parody accounts. We run a battery of supervised… ▽ More

    Submitted 1 May, 2020; v1 submitted 28 April, 2020; originally announced April 2020.

  7. arXiv:1906.03890  [pdf, ps, other

    cs.CL cs.SI

    Automatically Identifying Complaints in Social Media

    Authors: Daniel Preotiuc-Pietro, Mihaela Gaman, Nikolaos Aletras

    Abstract: Complaining is a basic speech act regularly used in human and computer mediated communication to express a negative mismatch between reality and expectations in a particular situation. Automatically identifying complaints in social media is of utmost importance for organizations or brands to improve the customer experience or in develo** dialogue systems for handling and responding to complaints… ▽ More

    Submitted 10 June, 2019; originally announced June 2019.

    Comments: Accepted at ACL 2019

  8. arXiv:1906.02059  [pdf, other

    cs.CL

    Neural Legal Judgment Prediction in English

    Authors: Ilias Chalkidis, Ion Androutsopoulos, Nikolaos Aletras

    Abstract: Legal judgment prediction is the task of automatically predicting the outcome of a court case, given a text describing the case's facts. Previous work on using neural models for this task has focused on Chinese; only feature-based models (e.g., using bags of words and topics) have been considered in English. We release a new English legal judgment prediction dataset, containing cases from the Euro… ▽ More

    Submitted 5 June, 2019; originally announced June 2019.

    Comments: 7 pages, short paper at ACL 2019

  9. arXiv:1905.10892  [pdf, other

    cs.CL

    Extreme Multi-Label Legal Text Classification: A case study in EU Legislation

    Authors: Ilias Chalkidis, Manos Fergadiotis, Prodromos Malakasiotis, Nikolaos Aletras, Ion Androutsopoulos

    Abstract: We consider the task of Extreme Multi-Label Text Classification (XMTC) in the legal domain. We release a new dataset of 57k legislative documents from EURLEX, the European Union's public document database, annotated with concepts from EUROVOC, a multidisciplinary thesaurus. The dataset is substantially larger than previous EURLEX datasets and suitable for XMTC, few-shot and zero-shot learning. Exp… ▽ More

    Submitted 26 May, 2019; originally announced May 2019.

    Comments: 10 pages, long paper at NLLP Workshop of NAACL-HLT 2019

  10. arXiv:1903.12542  [pdf, other

    cs.CL cs.IR

    Re-Ranking Words to Improve Interpretability of Automatically Generated Topics

    Authors: Areej Alokaili, Nikolaos Aletras, Mark Stevenson

    Abstract: Topics models, such as LDA, are widely used in Natural Language Processing. Making their output interpretable is an important area of research with applications to areas such as the enhancement of exploratory search interfaces and the development of interpretable machine learning models. Conventionally, topics are represented by their n most probable words, however, these representations are often… ▽ More

    Submitted 29 March, 2019; originally announced March 2019.

    Comments: Paper accepted for publication at IWCS 2019

  11. arXiv:1812.00086  [pdf, other

    cs.LG stat.ML

    Graph Node-Feature Convolution for Representation Learning

    Authors: Li Zhang, Heda Song, Nikolaos Aletras, Hai** Lu

    Abstract: Graph convolutional network (GCN) is an emerging neural network approach. It learns new representation of a node by aggregating feature vectors of all neighbors in the aggregation process without considering whether the neighbors or features are useful or not. Recent methods have improved solutions by sampling a fixed size set of neighbors, or assigning different weights to different neighbors in… ▽ More

    Submitted 31 March, 2022; v1 submitted 30 November, 2018; originally announced December 2018.

  12. arXiv:1808.08538  [pdf, other

    cs.CY cs.CL cs.SI

    Nowcasting the Stance of Social Media Users in a Sudden Vote: The Case of the Greek Referendum

    Authors: Adam Tsakalidis, Nikolaos Aletras, Alexandra I. Cristea, Maria Liakata

    Abstract: Modelling user voting intention in social media is an important research area, with applications in analysing electorate behaviour, online political campaigning and advertising. Previous approaches mainly focus on predicting national general elections, which are regularly scheduled and where data of past results and opinion polls are available. However, there is no evidence of how such models woul… ▽ More

    Submitted 26 August, 2018; originally announced August 2018.

    Comments: Preprint accepted for publication in the ACM International Conference on Information and Knowledge Management (CIKM 2018)

  13. arXiv:1804.04095  [pdf, other

    cs.CL cs.AI cs.SI

    Predicting Twitter User Socioeconomic Attributes with Network and Language Information

    Authors: Nikolaos Aletras, Benjamin Paul Chamberlain

    Abstract: Inferring socioeconomic attributes of social media users such as occupation and income is an important problem in computational social science. Automated inference of such characteristics has applications in personalised recommender systems, targeted computational advertising and online political campaigning. While previous work has shown that language features can reliably predict socioeconomic a… ▽ More

    Submitted 11 April, 2018; originally announced April 2018.

    Comments: Accepted at ACM HT 2018

  14. arXiv:1608.00470  [pdf, other

    cs.CL cs.CV

    Labeling Topics with Images using Neural Networks

    Authors: Nikolaos Aletras, Arpit Mittal

    Abstract: Topics generated by topic models are usually represented by lists of $t$ terms or alternatively using short phrases and images. The current state-of-the-art work on labeling topics using images selects images by re-ranking a small set of candidates for a given topic. In this paper, we present a more generic method that can estimate the degree of association between any arbitrary pair of an unseen… ▽ More

    Submitted 3 January, 2017; v1 submitted 1 August, 2016; originally announced August 2016.