We gratefully acknowledge support from
the Simons Foundation and member institutions.

Daniel Galvez is qualified to endorse.

LSH methods for data deduplication in a Wikipedia artificial dataset

Daniel Galvez: Is registered as an author of this paper.
Can endorse for cs.CL, cs.IR, cs.LG. (why?)

Juan Ciro, Tim Schlippe and David Kanter are not registered as owners of this paper. (why?)