Skip to main content

Showing 1–9 of 9 results for author: Sebastiani, F

Searching in archive stat. Search in all archives.
.
  1. arXiv:1911.11506  [pdf, other

    cs.LG cs.CL stat.ML

    Word-Class Embeddings for Multiclass Text Classification

    Authors: Alejandro Moreo, Andrea Esuli, Fabrizio Sebastiani

    Abstract: Pre-trained word embeddings encode general word semantics and lexical regularities of natural language, and have proven useful across many NLP tasks, including word sense disambiguation, machine translation, and sentiment analysis, to name a few. In supervised tasks such as multiclass text classification (the focus of this article) it seems appealing to enhance word representations with ad-hoc emb… ▽ More

    Submitted 26 November, 2019; originally announced November 2019.

    Journal ref: Final version published in Data Mining and Knowledge Discovery 35(3), 911-963, 2021

  2. arXiv:1904.07965  [pdf, ps, other

    cs.LG cs.IR stat.ML

    Cross-Lingual Sentiment Quantification

    Authors: Andrea Esuli, Alejandro Moreo, Fabrizio Sebastiani

    Abstract: \emph{Sentiment Quantification} (i.e., the task of estimating the relative frequency of sentiment-related classes -- such as \textsf{Positive} and \textsf{Negative} -- in a set of unlabelled documents) is an important topic in sentiment analysis, as the study of sentiment-related quantities and trends across a population is often of higher interest than the analysis of individual instances. In thi… ▽ More

    Submitted 7 July, 2020; v1 submitted 16 April, 2019; originally announced April 2019.

    Comments: Identical to previous version, but for the abstract, which is now identical to the one in the published version

    Journal ref: Final version published in IEEE Intelligent Systems 35(3):106-114, 2020

  3. arXiv:1903.12090  [pdf, other

    cs.LG cs.IR stat.ML

    Learning to Weight for Text Classification

    Authors: Alejandro Moreo Fernández, Andrea Esuli, Fabrizio Sebastiani

    Abstract: In information retrieval (IR) and related tasks, term weighting approaches typically consider the frequency of the term in the document and in the collection in order to compute a score reflecting the importance of the term for the document. In tasks characterized by the presence of training data (such as text classification) it seems logical that the term weighting function should take into accou… ▽ More

    Submitted 28 March, 2019; originally announced March 2019.

    Comments: To appear in IEEE Transactions on Knowledge and Data Engineering

    Journal ref: Final version published in IEEE Transactions on Data and Knowledge Engineering, 32(2):302-316, 2020

  4. arXiv:1901.11459  [pdf, other

    cs.LG cs.AI cs.IR stat.ML

    Funnelling: A New Ensemble Method for Heterogeneous Transfer Learning and its Application to Cross-Lingual Text Classification

    Authors: Andrea Esuli, Alejandro Moreo, Fabrizio Sebastiani

    Abstract: Cross-lingual Text Classification (CLC) consists of automatically classifying, according to a common set C of classes, documents each written in one of a set of languages L, and doing so more accurately than when naively classifying each document via its corresponding language-specific classifier. In order to obtain an increase in the classification accuracy for a given language, the system thus n… ▽ More

    Submitted 16 April, 2019; v1 submitted 31 January, 2019; originally announced January 2019.

    Comments: 28 pages, 4 figures

    Journal ref: Final version published in ACM Transactions on Information Systems 37(3), 37:1-37:30, 2019

  5. arXiv:1810.09311  [pdf, other

    cs.CL cs.LG stat.ML

    Revisiting Distributional Correspondence Indexing: A Python Reimplementation and New Experiments

    Authors: Alejandro Moreo, Andrea Esuli, Fabrizio Sebastiani

    Abstract: This paper introduces PyDCI, a new implementation of Distributional Correspondence Indexing (DCI) written in Python. DCI is a transfer learning method for cross-domain and cross-lingual text classification for which we had provided an implementation (here called JaDCI) built on top of JaTeCS, a Java framework for text classification. PyDCI is a stand-alone version of DCI that exploits scikit-learn… ▽ More

    Submitted 19 October, 2018; originally announced October 2018.

  6. arXiv:1809.01991  [pdf, other

    cs.LG cs.AI cs.IR stat.ML

    Evaluation Measures for Quantification: An Axiomatic Approach

    Authors: Fabrizio Sebastiani

    Abstract: Quantification is the task of estimating, given a set $σ$ of unlabelled items and a set of classes $\mathcal{C}=\{c_{1}, \ldots, c_{|\mathcal{C}|}\}$, the prevalence (or `relative frequency') in $σ$ of each class $c_{i}\in \mathcal{C}$. While quantification may in principle be solved by classifying each item in $σ$ and counting how many such items have been labelled with $c_{i}$, it has long been… ▽ More

    Submitted 6 September, 2018; originally announced September 2018.

    Comments: 36 pages, 2 figures. Submitted for publication in the Information Retrieval Journal

    Journal ref: Final version published in Information Retrieval Journal 23(3):255-288, 2020

  7. arXiv:1809.00836  [pdf, other

    cs.LG cs.CL stat.ML

    A Recurrent Neural Network for Sentiment Quantification

    Authors: Andrea Esuli, Alejandro Moreo Fernández, Fabrizio Sebastiani

    Abstract: Quantification is a supervised learning task that consists in predicting, given a set of classes C and a set D of unlabelled items, the prevalence (or relative frequency) p(c|D) of each class c in C. Quantification can in principle be solved by classifying all the unlabelled items and counting how many of them have been attributed to each class. However, this "classify and count" approach has been… ▽ More

    Submitted 4 September, 2018; originally announced September 2018.

    Comments: Accepted for publication at CIKM 2018

    ACM Class: I.2.6; I.2.7

    Journal ref: Final version published in Proceedings of the 27th ACM International Conference on Information and Knowledge Management (CIKM 2018), Torino, IT, 2018

  8. Optimizing Non-decomposable Measures with Deep Networks

    Authors: Amartya Sanyal, Pawan Kumar, Purushottam Kar, Sanjay Chawla, Fabrizio Sebastiani

    Abstract: We present a class of algorithms capable of directly training deep neural networks with respect to large families of task-specific performance measures such as the F-measure and the Kullback-Leibler divergence that are structured and non-decomposable. This presents a departure from standard deep learning techniques that typically use squared or cross-entropy loss functions (that are decomposable)… ▽ More

    Submitted 31 January, 2018; originally announced February 2018.

    Journal ref: Final version published in Machine Learning, 107(8-10):1597-1620, 2018

  9. arXiv:1605.04135  [pdf, other

    stat.ML cs.AI cs.IR cs.LG

    Online Optimization Methods for the Quantification Problem

    Authors: Purushottam Kar, Shuai Li, Harikrishna Narasimhan, Sanjay Chawla, Fabrizio Sebastiani

    Abstract: The estimation of class prevalence, i.e., the fraction of a population that belongs to a certain class, is a very useful tool in data analytics and learning, and finds applications in many domains such as sentiment analysis, epidemiology, etc. For example, in sentiment analysis, the objective is often not to estimate whether a specific text conveys a positive or a negative sentiment, but rather es… ▽ More

    Submitted 13 June, 2016; v1 submitted 13 May, 2016; originally announced May 2016.

    Comments: 26 pages, 6 figures. A short version of this manuscript will appear in the proceedings of the 22nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2016

    Journal ref: Final version published in Proceedings of the 22nd ACM Conference on Knowledge Discovery and Data Mining (KDD 2016), San Francisco, US, 2016, pp. 1625-1634