Skip to main content

Showing 1–1 of 1 results for author: Kadkhodaei, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2111.05062  [pdf, other

    cs.LG

    Look back, look around: a systematic analysis of effective predictors for new outlinks in focused Web crawling

    Authors: Thi Kim Nhung Dang, Doina Bucur, Berk Atil, Guillaume Pitel, Frank Ruis, Hamidreza Kadkhodaei, Nelly Litvak

    Abstract: Small and medium enterprises rely on detailed Web analytics to be informed about their market and competition. Focused crawlers meet this demand by crawling and indexing specific parts of the Web. Critically, a focused crawler must quickly find new pages that have not yet been indexed. Since a new page can be discovered only by following a new outlink, predicting new outlinks is very relevant in p… ▽ More

    Submitted 15 November, 2022; v1 submitted 9 November, 2021; originally announced November 2021.

    Comments: 23 pages, 15 figures, 4 tables, uses arxiv.sty, added new title, heuristic features and their results added, figures 7, 14, and 15 updated, accepted version