Skip to main content

Showing 1–6 of 6 results for author: Derby, S

.
  1. arXiv:2309.13080  [pdf, ps, other

    cs.CL cs.LG

    SPICED: News Similarity Detection Dataset with Multiple Topics and Complexity Levels

    Authors: Elena Shushkevich, Long Mai, Manuel V. Loureiro, Steven Derby, Tri Kurniawan Wijaya

    Abstract: Nowadays, the use of intelligent systems to detect redundant information in news articles has become especially prevalent with the proliferation of news media outlets in order to enhance user experience. However, the heterogeneous nature of news can lead to spurious findings in these systems: Simple heuristics such as whether a pair of news are both about politics can provide strong but deceptive… ▽ More

    Submitted 21 September, 2023; originally announced September 2023.

  2. arXiv:2302.12784  [pdf, other

    cs.CL cs.AI cs.LG

    STA: Self-controlled Text Augmentation for Improving Text Classifications

    Authors: Congcong Wang, Gonzalo Fiz Pontiveros, Steven Derby, Tri Kurniawan Wijaya

    Abstract: Despite recent advancements in Machine Learning, many tasks still involve working in low-data regimes which can make solving natural language problems difficult. Recently, a number of text augmentation techniques have emerged in the field of Natural Language Processing (NLP) which can enrich the training data with new examples, though they are not without their caveats. For instance, simple rule-b… ▽ More

    Submitted 24 February, 2023; originally announced February 2023.

  3. arXiv:2301.02458  [pdf, other

    cs.CL cs.LG

    Topics as Entity Clusters: Entity-based Topics from Language Models and Graph Neural Networks

    Authors: Manuel V. Loureiro, Steven Derby, Tri Kurniawan Wijaya

    Abstract: Topic models aim to reveal the latent structure behind a corpus, typically conducted over a bag-of-words representation of documents. In the context of topic modeling, most vocabulary is either irrelevant for uncovering underlying topics or contains strong relationships with relevant concepts, impacting the interpretability of these topics. Furthermore, their limited expressiveness and dependency… ▽ More

    Submitted 6 January, 2023; originally announced January 2023.

    Comments: 12 pages, 1 figure

  4. arXiv:2212.11856  [pdf, other

    cs.CL cs.AI

    Multilingual News Location Detection using an Entity-Based Siamese Network with Semi-Supervised Contrastive Learning and Knowledge Base

    Authors: Víctor Suárez-Paniagua, Steven Derby, Tri Kurniawan Wijaya

    Abstract: Early detection of relevant locations in a piece of news is especially important in extreme events such as environmental disasters, war conflicts, disease outbreaks, or political turmoils. Additionally, this detection also helps recommender systems to promote relevant news based on user locations. Note that, when the relevant locations are not mentioned explicitly in the text, state-of-the-art met… ▽ More

    Submitted 22 December, 2022; originally announced December 2022.

  5. arXiv:1908.11439  [pdf, other

    cs.CL

    Feature2Vec: Distributional semantic modelling of human property knowledge

    Authors: Steven Derby, Paul Miller, Barry Devereux

    Abstract: Feature norm datasets of human conceptual knowledge, collected in surveys of human volunteers, yield highly interpretable models of word meaning and play an important role in neurolinguistic research on semantic cognition. However, these datasets are limited in size due to practical obstacles associated with exhaustively listing properties for a large number of words. In contrast, the development… ▽ More

    Submitted 29 August, 2019; originally announced August 2019.

    Comments: 7 pages, Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP 2019)

    Journal ref: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP 2019)

  6. arXiv:1809.02534  [pdf, other

    cs.CL

    Using Sparse Semantic Embeddings Learned from Multimodal Text and Image Data to Model Human Conceptual Knowledge

    Authors: Steven Derby, Paul Miller, Brian Murphy, Barry Devereux

    Abstract: Distributional models provide a convenient way to model semantics using dense embedding spaces derived from unsupervised learning algorithms. However, the dimensions of dense embedding spaces are not designed to resemble human semantic knowledge. Moreover, embeddings are often built from a single source of information (typically text data), even though neurocognitive research suggests that semanti… ▽ More

    Submitted 14 November, 2018; v1 submitted 7 September, 2018; originally announced September 2018.

    Comments: Proceedings of the 22nd Conference on Computational Natural Language Learning (CoNLL 2018), pages 260-270. Brussels, Belgium, October 31 - November 1, 2018. Association for Computational Linguistics

    Journal ref: Proceedings of the 22nd Conference on Computational Natural Language Learning (CoNLL 2018), pages 260-270. Brussels, Belgium, October 31 - November 1, 2018. Association for Computational Linguistics