Skip to main content

Showing 1–21 of 21 results for author: Frieder, O

.
  1. Lexically-Accelerated Dense Retrieval

    Authors: Hrishikesh Kulkarni, Sean MacAvaney, Nazli Goharian, Ophir Frieder

    Abstract: Retrieval approaches that score documents based on learned dense vectors (i.e., dense retrieval) rather than lexical signals (i.e., conventional retrieval) are increasingly popular. Their ability to identify related documents that do not necessarily contain the same terms as those appearing in the user's query (thereby improving recall) is one of their key advantages. However, to actually achieve… ▽ More

    Submitted 31 July, 2023; originally announced July 2023.

    Comments: SIGIR 2023

  2. arXiv:2211.14155  [pdf, other

    cs.IR

    Caching Historical Embeddings in Conversational Search

    Authors: Ophir Frieder, Ida Mele, Cristina Ioana Muntean, Franco Maria Nardini, Raffaele Perego, Nicola Tonellotto

    Abstract: Rapid response, namely low latency, is fundamental in search applications; it is particularly so in interactive search sessions, such as those encountered in conversational settings. An observation with a potential to reduce latency asserts that conversational queries exhibit a temporal locality in the lists of documents retrieved. Motivated by this observation, we propose and evaluate a client-si… ▽ More

    Submitted 25 November, 2022; originally announced November 2022.

  3. arXiv:2209.00655  [pdf

    cs.LG cs.CY

    Self-supervised Representation Learning on Electronic Health Records with Graph Kernel Infomax

    Authors: Hao-Ren Yao, Nairen Cao, Katina Russell, Der-Chen Chang, Ophir Frieder, Jeremy Fineman

    Abstract: Learning Electronic Health Records (EHRs) representation is a preeminent yet under-discovered research topic. It benefits various clinical decision support applications, e.g., medication outcome prediction or patient similarity search. Current approaches focus on task-specific label supervision on vectorized sequential EHR, which is not applicable to large-scale unsupervised scenarios. Recently, c… ▽ More

    Submitted 20 February, 2024; v1 submitted 1 September, 2022; originally announced September 2022.

    Comments: Accepted to ACM Transactions on Computing for Healthcare (HEALTH)

  4. arXiv:2108.12752  [pdf, other

    cs.IR

    TAR on Social Media: A Framework for Online Content Moderation

    Authors: Eugene Yang, David D. Lewis, Ophir Frieder

    Abstract: Content moderation (removing or limiting the distribution of posts based on their contents) is one tool social networks use to fight problems such as harassment and disinformation. Manually screening all content is usually impractical given the scale of social media data, and the need for nuanced human interpretations makes fully automated approaches infeasible. We consider content moderation from… ▽ More

    Submitted 29 August, 2021; originally announced August 2021.

    Comments: 9 pages, 2 figures, accepted at DESIRES 2021

  5. Certifying One-Phase Technology-Assisted Reviews

    Authors: David D. Lewis, Eugene Yang, Ophir Frieder

    Abstract: Technology-assisted review (TAR) workflows based on iterative active learning are widely used in document review applications. Most stop** rules for one-phase TAR workflows lack valid statistical guarantees, which has discouraged their use in some legal contexts. Drawing on the theory of quantile estimation, we provide the first broadly applicable and statistically valid sample-based stop** ru… ▽ More

    Submitted 29 August, 2021; originally announced August 2021.

    Comments: 10 pages, 4 figures, accepted at CIKM 2021

  6. Heuristic Stop** Rules For Technology-Assisted Review

    Authors: Eugene Yang, David D. Lewis, Ophir Frieder

    Abstract: Technology-assisted review (TAR) refers to human-in-the-loop active learning workflows for finding relevant documents in large collections. These workflows often must meet a target for the proportion of relevant documents found (i.e. recall) while also holding down costs. A variety of heuristic stop** rules have been suggested for striking this tradeoff in particular settings, but none have been… ▽ More

    Submitted 17 June, 2021; originally announced June 2021.

    Comments: 10 pages, 2 figures. Accepted at DocEng 21

  7. On Minimizing Cost in Legal Document Review Workflows

    Authors: Eugene Yang, David D. Lewis, Ophir Frieder

    Abstract: Technology-assisted review (TAR) refers to human-in-the-loop machine learning workflows for document review in legal discovery and other high recall review tasks. Attorneys and legal technologists have debated whether review should be a single iterative process (one-phase TAR workflows) or whether model training and review should be separate (two-phase TAR workflows), with implications for the cho… ▽ More

    Submitted 17 June, 2021; originally announced June 2021.

    Comments: 10 pages, 3 figures. Accepted at DocEng 21

  8. arXiv:2105.01044  [pdf, other

    cs.IR cs.CL

    Goldilocks: Just-Right Tuning of BERT for Technology-Assisted Review

    Authors: Eugene Yang, Sean MacAvaney, David D. Lewis, Ophir Frieder

    Abstract: Technology-assisted review (TAR) refers to iterative active learning workflows for document review in high recall retrieval (HRR) tasks. TAR research and most commercial TAR software have applied linear models such as logistic regression to lexical features. Transformer-based models with supervised tuning are known to improve effectiveness on many text classification tasks, suggesting their use in… ▽ More

    Submitted 19 January, 2022; v1 submitted 3 May, 2021; originally announced May 2021.

    Comments: 6 pages, 1 figure, accepted at ECIR 2022

  9. arXiv:2102.02446  [pdf, other

    cs.LG math.GT

    The Analysis from Nonlinear Distance Metric to Kernel-based Drug Prescription Prediction System

    Authors: Der-Chen Chang, Ophir Frieder, Chi-Feng Hung, Hao-Ren Yao

    Abstract: Distance metrics and their nonlinear variant play a crucial role in machine learning based real-world problem solving. We demonstrated how Euclidean and cosine distance measures differ not only theoretically but also in real-world medical application, namely, outcome prediction of drug prescription. Euclidean distance exhibits favorable properties in the local geometry problem. To this regard, Euc… ▽ More

    Submitted 23 February, 2021; v1 submitted 4 February, 2021; originally announced February 2021.

    Comments: Accepted to Journal of Nonlinear and Variational Analysis, JNVA 2021

  10. Cross-Global Attention Graph Kernel Network Prediction of Drug Prescription

    Authors: Hao-Ren Yao, Der-Chen Chang, Ophir Frieder, Wendy Huang, I-Chia Liang, Chi-Feng Hung

    Abstract: We present an end-to-end, interpretable, deep-learning architecture to learn a graph kernel that predicts the outcome of chronic disease drug prescription. This is achieved through a deep metric learning collaborative with a Support Vector Machine objective using a graphical representation of Electronic Health Records. We formulate the predictive model as a binary graph classification problem with… ▽ More

    Submitted 4 August, 2020; originally announced August 2020.

    Comments: ACM-BCB 2020 (Full paper)

    Journal ref: Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics (BCB '20), September 21-24, 2020, Virtual Event, USA

  11. arXiv:2007.14477  [pdf, ps, other

    cs.CL

    GUIR at SemEval-2020 Task 12: Domain-Tuned Contextualized Models for Offensive Language Detection

    Authors: Sajad Sotudeh, Tong Xiang, Hao-Ren Yao, Sean MacAvaney, Eugene Yang, Nazli Goharian, Ophir Frieder

    Abstract: Offensive language detection is an important and challenging task in natural language processing. We present our submissions to the OffensEval 2020 shared task, which includes three English sub-tasks: identifying the presence of offensive language (Sub-task A), identifying the presence of target in offensive language (Sub-task B), and identifying the categories of the target (Sub-task C). Our expe… ▽ More

    Submitted 28 July, 2020; originally announced July 2020.

    Comments: SemEval 2020

  12. arXiv:2005.08805  [pdf, ps, other

    cs.CL

    Interaction Matching for Long-Tail Multi-Label Classification

    Authors: Sean MacAvaney, Franck Dernoncourt, Walter Chang, Nazli Goharian, Ophir Frieder

    Abstract: We present an elegant and effective approach for addressing limitations in existing multi-label classification models by incorporating interaction matching, a concept shown to be useful for ad-hoc search result ranking. By performing soft n-gram interaction matching, we match labels with natural language descriptions (which are common to have in most multi-labeling tasks). Our approach can be used… ▽ More

    Submitted 18 May, 2020; originally announced May 2020.

  13. Training Curricula for Open Domain Answer Re-Ranking

    Authors: Sean MacAvaney, Franco Maria Nardini, Raffaele Perego, Nicola Tonellotto, Nazli Goharian, Ophir Frieder

    Abstract: In precision-oriented tasks like answer ranking, it is more important to rank many relevant answers highly than to retrieve all relevant answers. It follows that a good ranking strategy would be to learn how to identify the easiest correct answers first (i.e., assign a high ranking score to answers that have characteristics that usually indicate relevance, and a low ranking score to those with cha… ▽ More

    Submitted 21 May, 2020; v1 submitted 29 April, 2020; originally announced April 2020.

    Comments: Accepted at SIGIR 2020 (long)

  14. Efficient Document Re-Ranking for Transformers by Precomputing Term Representations

    Authors: Sean MacAvaney, Franco Maria Nardini, Raffaele Perego, Nicola Tonellotto, Nazli Goharian, Ophir Frieder

    Abstract: Deep pretrained transformer networks are effective at various ranking tasks, such as question answering and ad-hoc document ranking. However, their computational expenses deem them cost-prohibitive in practice. Our proposed approach, called PreTTR (Precomputing Transformer Term Representations), considerably reduces the query-time latency of deep transformer networks (up to a 42x speedup on web do… ▽ More

    Submitted 26 May, 2020; v1 submitted 29 April, 2020; originally announced April 2020.

    Comments: Accepted at SIGIR 2020 (long)

  15. Expansion via Prediction of Importance with Contextualization

    Authors: Sean MacAvaney, Franco Maria Nardini, Raffaele Perego, Nicola Tonellotto, Nazli Goharian, Ophir Frieder

    Abstract: The identification of relevance with little textual context is a primary challenge in passage retrieval. We address this problem with a representation-based ranking approach that: (1) explicitly models the importance of each term using a contextualized language model; (2) performs passage expansion by propagating the importance to similar terms; and (3) grounds the representations in the lexicon,… ▽ More

    Submitted 20 May, 2020; v1 submitted 29 April, 2020; originally announced April 2020.

    Comments: Accepted at SIGIR 2020 (short)

  16. arXiv:2004.14054  [pdf, other

    cs.IR cs.CL

    Topic Propagation in Conversational Search

    Authors: I. Mele, C. I. Muntean, F. M. Nardini, R. Perego, N. Tonellotto, O. Frieder

    Abstract: In a conversational context, a user expresses her multi-faceted information need as a sequence of natural-language questions, i.e., utterances. Starting from a given topic, the conversation evolves through user utterances and system replies. The retrieval of documents relevant to a given utterance in a conversation is challenging due to ambiguity of natural language and to the difficulty of detect… ▽ More

    Submitted 29 April, 2020; originally announced April 2020.

    Comments: 5 pages

  17. arXiv:2001.03010  [pdf, other

    cs.IR cs.DB

    Topical Result Caching in Web Search Engines

    Authors: Ida Mele, Nicola Tonellotto, Ophir Frieder, Raffaele Perego

    Abstract: Caching search results is employed in information retrieval systems to expedite query processing and reduce back-end server workload. Motivated by the observation that queries belonging to different topics have different temporal-locality patterns, we investigate a novel caching model called STD (Static-Topic-Dynamic cache). It improves traditional SDC (Static-Dynamic Cache) that stores in a stati… ▽ More

    Submitted 9 January, 2020; originally announced January 2020.

  18. Overcoming low-utility facets for complex answer retrieval

    Authors: Sean MacAvaney, Andrew Yates, Arman Cohan, Luca Soldaini, Kai Hui, Nazli Goharian, Ophir Frieder

    Abstract: Many questions cannot be answered simply; their answers must include numerous nuanced details and additional context. Complex Answer Retrieval (CAR) is the retrieval of answers to such questions. In their simplest form, these questions are constructed from a topic entity (e.g., `cheese') and a facet (e.g., `health effects'). While topic matching has been thoroughly explored, we observe that some f… ▽ More

    Submitted 21 November, 2018; originally announced November 2018.

    Comments: This is a pre-print of an article published in Information Retrieval Journal. The final authenticated version (including additional experimental results, analysis, etc.) is available online at: https://doi.org/10.1007/s10791-018-9343-0

    Journal ref: Information Retrieval Journal 2018

  19. arXiv:1805.00791  [pdf, other

    cs.IR

    Characterizing Question Facets for Complex Answer Retrieval

    Authors: Sean MacAvaney, Andrew Yates, Arman Cohan, Luca Soldaini, Kai Hui, Nazli Goharian, Ophir Frieder

    Abstract: Complex answer retrieval (CAR) is the process of retrieving answers to questions that have multifaceted or nuanced answers. In this work, we present two novel approaches for CAR based on the observation that question facets can vary in utility: from structural (facets that can apply to many similar topics, such as 'History') to topical (facets that are specific to the question's topic, such as the… ▽ More

    Submitted 2 May, 2018; originally announced May 2018.

    Comments: 4 pages; SIGIR 2018 Short Paper

  20. Content-Based Weak Supervision for Ad-Hoc Re-Ranking

    Authors: Sean MacAvaney, Andrew Yates, Kai Hui, Ophir Frieder

    Abstract: One challenge with neural ranking is the need for a large amount of manually-labeled relevance judgments for training. In contrast with prior work, we examine the use of weak supervision sources for training that yield pseudo query-document pairs that already exhibit relevance (e.g., newswire headline-content pairs and encyclopedic heading-paragraph pairs). We also propose filtering techniques to… ▽ More

    Submitted 5 July, 2019; v1 submitted 1 July, 2017; originally announced July 2017.

    Comments: SIGIR 2019 (short paper)

  21. arXiv:1410.6121  [pdf, other

    cond-mat.str-el cond-mat.stat-mech cs.CC cs.CE math-ph

    The Nonequilibrium Many-Body Problem as a paradigm for extreme data science

    Authors: J. K. Freericks, B. K. Nikolic, O. Frieder

    Abstract: Generating big data pervades much of physics. But some problems, which we call extreme data problems, are too large to be treated within big data science. The nonequilibrium quantum many-body problem on a lattice is just such a problem, where the Hilbert space grows exponentially with system size and rapidly becomes too large to fit on any computer (and can be effectively thought of as an infinite… ▽ More

    Submitted 9 December, 2014; v1 submitted 22 October, 2014; originally announced October 2014.

    Comments: 33 pages, 7 figures, invited review for Int. J. Mod. Phys. B; published version with additional references

    Journal ref: Int J. Mod. Phys. B 28, 1430021 (2014)