Skip to main content

Showing 1–1 of 1 results for author: Gentile, A L

.
  1. arXiv:2108.11948  [pdf, other

    cs.CL cs.IR

    SAUCE: Truncated Sparse Document Signature Bit-Vectors for Fast Web-Scale Corpus Expansion

    Authors: Muntasir Wahed, Daniel Gruhl, Alfredo Alba, Anna Lisa Gentile, Petar Ristoski, Chad Deluca, Steve Welch, Ismini Lourentzou

    Abstract: Recent advances in text representation have shown that training on large amounts of text is crucial for natural language understanding. However, models trained without predefined notions of topical interest typically require careful fine-tuning when transferred to specialized domains. When a sufficient amount of within-domain text may not be available, expanding a seed corpus of relevant documents… ▽ More

    Submitted 26 August, 2021; originally announced August 2021.

    Comments: Accepted to CIKM'21 Applied Research Track