Skip to main content

Showing 1–8 of 8 results for author: Paul, M J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2102.11103  [pdf, other

    cs.CL cs.LG

    User Factor Adaptation for User Embedding via Multitask Learning

    Authors: Xiaolei Huang, Michael J. Paul, Robin Burke, Franck Dernoncourt, Mark Dredze

    Abstract: Language varies across users and their interested fields in social media data: words authored by a user across his/her interests may have different meanings (e.g., cool) or sentiments (e.g., fast). However, most of the existing methods to train user embeddings ignore the variations across user interests, such as product and movie categories (e.g., drama vs. action). In this study, we treat the use… ▽ More

    Submitted 22 February, 2021; originally announced February 2021.

    Comments: Accepted in the Second Workshop on Domain Adaptation for Natural Language Processing (Adapted-NLP)

  2. arXiv:2005.00524  [pdf, other

    cs.CL cs.LG

    Why Overfitting Isn't Always Bad: Retrofitting Cross-Lingual Word Embeddings to Dictionaries

    Authors: Mozhi Zhang, Yoshinari Fu**uma, Michael J. Paul, Jordan Boyd-Graber

    Abstract: Cross-lingual word embeddings (CLWE) are often evaluated on bilingual lexicon induction (BLI). Recent CLWE methods use linear projections, which underfit the training dictionary, to generalize on BLI. However, underfitting can hinder generalization to other downstream tasks that rely on words from the training dictionary. We address this limitation by retrofitting CLWE to the training dictionary,… ▽ More

    Submitted 1 May, 2020; originally announced May 2020.

    Comments: ACL 2020

  3. arXiv:2002.10361  [pdf, other

    cs.CL

    Multilingual Twitter Corpus and Baselines for Evaluating Demographic Bias in Hate Speech Recognition

    Authors: Xiaolei Huang, Linzi Xing, Franck Dernoncourt, Michael J. Paul

    Abstract: Existing research on fairness evaluation of document classification models mainly uses synthetic monolingual data without ground truth for author demographic attributes. In this work, we assemble and publish a multilingual Twitter corpus for the task of hate speech detection with inferred four author demographic factors: age, country, gender and race/ethnicity. The corpus covers five languages: En… ▽ More

    Submitted 3 March, 2020; v1 submitted 24 February, 2020; originally announced February 2020.

    Comments: Accepted at LREC 2020

  4. arXiv:1909.03524  [pdf, other

    cs.CL

    Evaluating Topic Quality with Posterior Variability

    Authors: Linzi Xing, Michael J. Paul, Giuseppe Carenini

    Abstract: Probabilistic topic models such as latent Dirichlet allocation (LDA) are popularly used with Bayesian inference methods such as Gibbs sampling to learn posterior distributions over topic model parameters. We derive a novel measure of LDA topic quality using the variability of the posterior distributions. Compared to several existing baselines for automatic topic evaluation, the proposed metric ach… ▽ More

    Submitted 15 September, 2019; v1 submitted 8 September, 2019; originally announced September 2019.

    Comments: 8 pages

  5. A Resource-Free Evaluation Metric for Cross-Lingual Word Embeddings Based on Graph Modularity

    Authors: Yoshinari Fu**uma, Jordan Boyd-Graber, Michael J. Paul

    Abstract: Cross-lingual word embeddings encode the meaning of words from different languages into a shared low-dimensional space. An important requirement for many downstream tasks is that word similarity should be independent of language - i.e., word vectors within one language should not be more similar to each other than to words in another language. We measure this characteristic using modularity, a net… ▽ More

    Submitted 5 June, 2019; originally announced June 2019.

    Comments: Accepted to ACL 2019, camera-ready

  6. arXiv:1810.05867  [pdf, other

    cs.CL

    An Empirical Study on Crosslingual Transfer in Probabilistic Topic Models

    Authors: Shudong Hao, Michael J. Paul

    Abstract: Probabilistic topic modeling is a popular choice as the first step of crosslingual tasks to enable knowledge transfer and extract multilingual features. While many multilingual topic models have been developed, their assumptions on the training corpus are quite varied, and it is not clear how well the models can be applied under various training conditions. In this paper, we systematically study t… ▽ More

    Submitted 10 June, 2019; v1 submitted 13 October, 2018; originally announced October 2018.

  7. arXiv:1806.04270  [pdf, other

    cs.CL

    Learning Multilingual Topics from Incomparable Corpus

    Authors: Shudong Hao, Michael J. Paul

    Abstract: Multilingual topic models enable crosslingual tasks by extracting consistent topics from multilingual corpora. Most models require parallel or comparable training corpora, which limits their ability to generalize. In this paper, we first demystify the knowledge transfer mechanism behind multilingual topic models by defining an alternative but equivalent formulation. Based on this analysis, we then… ▽ More

    Submitted 11 June, 2018; originally announced June 2018.

    Comments: To appear in International Conference on Computational Linguistics (COLING), 2018

  8. arXiv:1804.10184  [pdf, other

    cs.CL

    Lessons from the Bible on Modern Topics: Low-Resource Multilingual Topic Model Evaluation

    Authors: Shudong Hao, Jordan Boyd-Graber, Michael J. Paul

    Abstract: Multilingual topic models enable document analysis across languages through coherent multilingual summaries of the data. However, there is no standard and effective metric to evaluate the quality of multilingual topics. We introduce a new intrinsic evaluation of multilingual topic models that correlates well with human judgments of multilingual topic coherence as well as performance in downstream… ▽ More

    Submitted 26 April, 2018; originally announced April 2018.

    Comments: North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), New Orleans, Louisiana. June 2018