Skip to main content

Showing 1–8 of 8 results for author: Lewis, D D

.
  1. TARexp: A Python Framework for Technology-Assisted Review Experiments

    Authors: Eugene Yang, David D. Lewis

    Abstract: Technology-assisted review (TAR) is an important industrial application of information retrieval (IR) and machine learning (ML). While a small TAR research community exists, the complexity of TAR software and workflows is a major barrier to entry. Drawing on past open source TAR efforts, as well as design patterns from the IR and ML open source software, we present an open source Python framework… ▽ More

    Submitted 24 April, 2022; v1 submitted 23 February, 2022; originally announced February 2022.

    Comments: 6 pages, 4 figures, accepted as a SIGIR 2022 demo paper

  2. arXiv:2108.12752  [pdf, other

    cs.IR

    TAR on Social Media: A Framework for Online Content Moderation

    Authors: Eugene Yang, David D. Lewis, Ophir Frieder

    Abstract: Content moderation (removing or limiting the distribution of posts based on their contents) is one tool social networks use to fight problems such as harassment and disinformation. Manually screening all content is usually impractical given the scale of social media data, and the need for nuanced human interpretations makes fully automated approaches infeasible. We consider content moderation from… ▽ More

    Submitted 29 August, 2021; originally announced August 2021.

    Comments: 9 pages, 2 figures, accepted at DESIRES 2021

  3. Certifying One-Phase Technology-Assisted Reviews

    Authors: David D. Lewis, Eugene Yang, Ophir Frieder

    Abstract: Technology-assisted review (TAR) workflows based on iterative active learning are widely used in document review applications. Most stop** rules for one-phase TAR workflows lack valid statistical guarantees, which has discouraged their use in some legal contexts. Drawing on the theory of quantile estimation, we provide the first broadly applicable and statistically valid sample-based stop** ru… ▽ More

    Submitted 29 August, 2021; originally announced August 2021.

    Comments: 10 pages, 4 figures, accepted at CIKM 2021

  4. Heuristic Stop** Rules For Technology-Assisted Review

    Authors: Eugene Yang, David D. Lewis, Ophir Frieder

    Abstract: Technology-assisted review (TAR) refers to human-in-the-loop active learning workflows for finding relevant documents in large collections. These workflows often must meet a target for the proportion of relevant documents found (i.e. recall) while also holding down costs. A variety of heuristic stop** rules have been suggested for striking this tradeoff in particular settings, but none have been… ▽ More

    Submitted 17 June, 2021; originally announced June 2021.

    Comments: 10 pages, 2 figures. Accepted at DocEng 21

  5. On Minimizing Cost in Legal Document Review Workflows

    Authors: Eugene Yang, David D. Lewis, Ophir Frieder

    Abstract: Technology-assisted review (TAR) refers to human-in-the-loop machine learning workflows for document review in legal discovery and other high recall review tasks. Attorneys and legal technologists have debated whether review should be a single iterative process (one-phase TAR workflows) or whether model training and review should be separate (two-phase TAR workflows), with implications for the cho… ▽ More

    Submitted 17 June, 2021; originally announced June 2021.

    Comments: 10 pages, 3 figures. Accepted at DocEng 21

  6. arXiv:2105.01044  [pdf, other

    cs.IR cs.CL

    Goldilocks: Just-Right Tuning of BERT for Technology-Assisted Review

    Authors: Eugene Yang, Sean MacAvaney, David D. Lewis, Ophir Frieder

    Abstract: Technology-assisted review (TAR) refers to iterative active learning workflows for document review in high recall retrieval (HRR) tasks. TAR research and most commercial TAR software have applied linear models such as logistic regression to lexical features. Transformer-based models with supervised tuning are known to improve effectiveness on many text classification tasks, suggesting their use in… ▽ More

    Submitted 19 January, 2022; v1 submitted 3 May, 2021; originally announced May 2021.

    Comments: 6 pages, 1 figure, accepted at ECIR 2022

  7. arXiv:1911.03969  [pdf, ps, other

    math.GR

    Centralizer-like Subgroups Associated with the $n$-Engel Words Inside of Direct Product Groups

    Authors: Bridget Lee, Maggie Reardon, Faculty Mentor Dr. Dandrielle Lewis

    Abstract: This research provides a characterization of centralizer-like subgroups associated with the $n$-Engel word in a direct product of groups. Specifically, properties of the set of right $n$-Engel elements inside of direct products are explored. A proof is given to demonstrate the equivalence between the set of right $n$-Engel elements of a direct product of two groups and a direct product of the set… ▽ More

    Submitted 15 March, 2021; v1 submitted 10 November, 2019; originally announced November 2019.

  8. arXiv:cmp-lg/9407020  [pdf, ps

    cs.CL

    A Sequential Algorithm for Training Text Classifiers

    Authors: David D. Lewis, William A. Gale

    Abstract: The ability to cheaply train text classifiers is critical to their use in information retrieval, content analysis, natural language processing, and other tasks involving data which is partly or fully textual. An algorithm for sequential sampling during machine learning of statistical classifiers was developed and tested on a newswire text categorization task. This method, which we call uncertain… ▽ More

    Submitted 24 July, 1994; v1 submitted 24 July, 1994; originally announced July 1994.

    Comments: 10 pages, uuencoded, compressed PostScript; Proc. SIGIR-94 LaTex available from [email protected]