Skip to main content

Showing 1–17 of 17 results for author: Zhang, M J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2311.09469  [pdf, other

    cs.CL

    Clarify When Necessary: Resolving Ambiguity Through Interaction with LMs

    Authors: Michael J. Q. Zhang, Eunsol Choi

    Abstract: Resolving ambiguities through interaction is a hallmark of natural language, and modeling this behavior is a core challenge in crafting AI assistants. In this work, we study such behavior in LMs by proposing a task-agnostic framework for resolving ambiguity by asking users clarifying questions. Our framework breaks down this objective into three subtasks: (1) determining when clarification is need… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

  2. arXiv:2310.18844  [pdf, other

    cs.LG cs.AI

    BanditPAM++: Faster $k$-medoids Clustering

    Authors: Mo Tiwari, Ryan Kang, Donghyun Lee, Sebastian Thrun, Chris Piech, Ilan Shomorony, Martin **ye Zhang

    Abstract: Clustering is a fundamental task in data science with wide-ranging applications. In $k$-medoids clustering, cluster centers must be actual datapoints and arbitrary distance metrics may be used; these features allow for greater interpretability of the cluster centers and the clustering of exotic objects in $k$-medoids clustering, respectively. $k$-medoids clustering has recently grown in popularity… ▽ More

    Submitted 28 October, 2023; originally announced October 2023.

    Comments: NeurIPS 2023

    MSC Class: 68 ACM Class: I.m; I.2.0; I.2.6; K.3.2; I.2.m

  3. arXiv:2306.09306  [pdf, other

    cs.CL

    Propagating Knowledge Updates to LMs Through Distillation

    Authors: Shankar Padmanabhan, Yasumasa Onoe, Michael J. Q. Zhang, Greg Durrett, Eunsol Choi

    Abstract: Modern language models have the capacity to store and use immense amounts of knowledge about real-world entities, but it remains unclear how to update such knowledge stored in model parameters. While prior methods for updating knowledge in LMs successfully inject atomic facts, updated LMs fail to make inferences based on injected facts. In this work, we demonstrate that a context distillation-base… ▽ More

    Submitted 30 October, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

    Comments: NeurIPS 2023 Camera Ready

  4. arXiv:2305.14824  [pdf, other

    cs.CL

    Mitigating Temporal Misalignment by Discarding Outdated Facts

    Authors: Michael J. Q. Zhang, Eunsol Choi

    Abstract: While large language models are able to retain vast amounts of world knowledge seen during pretraining, such knowledge is prone to going out of date and is nontrivial to update. Furthermore, these models are often used under temporal misalignment, tasked with answering questions about the present, despite having only been trained on data collected in the past. To mitigate the effects of temporal m… ▽ More

    Submitted 5 March, 2024; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: Accepted into EMNLP 2023

  5. arXiv:2305.14613  [pdf, other

    cs.CL cs.AI

    Selectively Answering Ambiguous Questions

    Authors: Jeremy R. Cole, Michael J. Q. Zhang, Daniel Gillick, Julian Martin Eisenschlos, Bhuwan Dhingra, Jacob Eisenstein

    Abstract: Trustworthy language models should abstain from answering questions when they do not know the answer. However, the answer to a question can be unknown for a variety of reasons. Prior research has focused on the case in which the question is clear and the answer is unambiguous but possibly unknown, but the answer to a question can also be unclear due to uncertainty of the questioner's intent or con… ▽ More

    Submitted 14 November, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: To appear in EMNLP 2023. 9 pages, 5 figures, 2 pages of appendix

  6. arXiv:2305.01651  [pdf, other

    cs.CL

    Can LMs Learn New Entities from Descriptions? Challenges in Propagating Injected Knowledge

    Authors: Yasumasa Onoe, Michael J. Q. Zhang, Shankar Padmanabhan, Greg Durrett, Eunsol Choi

    Abstract: Pre-trained language models (LMs) are used for knowledge intensive tasks like question answering, but their knowledge gets continuously outdated as the world changes. Prior work has studied targeted updates to LMs, injecting individual facts and evaluating whether the model learns these facts while not changing predictions on other contexts. We take a step forward and study LMs' abilities to make… ▽ More

    Submitted 2 May, 2023; originally announced May 2023.

    Comments: ACL 2023

  7. arXiv:2303.00242  [pdf, other

    cs.CL

    DIFFQG: Generating Questions to Summarize Factual Changes

    Authors: Jeremy R. Cole, Palak Jain, Julian Martin Eisenschlos, Michael J. Q. Zhang, Eunsol Choi, Bhuwan Dhingra

    Abstract: Identifying the difference between two versions of the same article is useful to update knowledge bases and to understand how articles evolve. Paired texts occur naturally in diverse situations: reporters write similar news stories and maintainers of authoritative websites must keep their information up to date. We propose representing factual changes between paired documents as question-answer pa… ▽ More

    Submitted 1 March, 2023; originally announced March 2023.

    Comments: 14 pages. Accepted at EACL 2023 (main, long)

  8. arXiv:2212.07551  [pdf, ps, other

    cs.LG cs.AI

    Faster Maximum Inner Product Search in High Dimensions

    Authors: Mo Tiwari, Ryan Kang, Je-Yong Lee, Donghyun Lee, Chris Piech, Sebastian Thrun, Ilan Shomorony, Martin **ye Zhang

    Abstract: Maximum Inner Product Search (MIPS) is a ubiquitous task in machine learning applications such as recommendation systems. Given a query vector and $n$ atom vectors in $d$-dimensional space, the goal of MIPS is to find the atom that has the highest inner product with the query vector. Existing MIPS algorithms scale at least as $O(\sqrt{d})$, which becomes computationally prohibitive in high-dimensi… ▽ More

    Submitted 26 June, 2023; v1 submitted 14 December, 2022; originally announced December 2022.

    Comments: 24 pages

  9. arXiv:2212.07473  [pdf, ps, other

    cs.LG cs.DS

    MABSplit: Faster Forest Training Using Multi-Armed Bandits

    Authors: Mo Tiwari, Ryan Kang, Je-Yong Lee, Sebastian Thrun, Chris Piech, Ilan Shomorony, Martin **ye Zhang

    Abstract: Random forests are some of the most widely used machine learning models today, especially in domains that necessitate interpretability. We present an algorithm that accelerates the training of random forests and other popular tree-based learning methods. At the core of our algorithm is a novel node-splitting subroutine, dubbed MABSplit, used to efficiently find split points when constructing decis… ▽ More

    Submitted 14 December, 2022; originally announced December 2022.

    Comments: Published at NeurIPS 2022, 30 pages

    ACM Class: I.2.8

  10. arXiv:2210.13701  [pdf, other

    cs.CL

    Rich Knowledge Sources Bring Complex Knowledge Conflicts: Recalibrating Models to Reflect Conflicting Evidence

    Authors: Hung-Ting Chen, Michael J. Q. Zhang, Eunsol Choi

    Abstract: Question answering models can use rich knowledge sources -- up to one hundred retrieved passages and parametric knowledge in the large-scale language model (LM). Prior work assumes information in such knowledge sources is consistent with each other, paying little attention to how models blend information stored in their LM parameters with that from retrieved evidence documents. In this paper, we s… ▽ More

    Submitted 24 October, 2022; originally announced October 2022.

    Comments: Accepted to the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP 2022)

  11. arXiv:2205.02832  [pdf, other

    cs.CL

    Entity Cloze By Date: What LMs Know About Unseen Entities

    Authors: Yasumasa Onoe, Michael J. Q. Zhang, Eunsol Choi, Greg Durrett

    Abstract: Language models (LMs) are typically trained once on a large-scale corpus and used for years without being updated. However, in a dynamic world, new entities constantly arise. We propose a framework to analyze what LMs can infer about new entities that did not exist when the LMs were pretrained. We derive a dataset of entities indexed by their origination date and paired with their English Wikipedi… ▽ More

    Submitted 5 May, 2022; originally announced May 2022.

    Comments: NAACL 2022 Findings

  12. arXiv:2109.06157  [pdf, other

    cs.CL

    SituatedQA: Incorporating Extra-Linguistic Contexts into QA

    Authors: Michael J. Q. Zhang, Eunsol Choi

    Abstract: Answers to the same question may change depending on the extra-linguistic contexts (when and where the question was asked). To study this challenge, we introduce SituatedQA, an open-retrieval QA dataset where systems must produce the correct answer to a question given the temporal or geographical context. To construct SituatedQA, we first identify such questions in existing QA datasets. We find th… ▽ More

    Submitted 13 September, 2021; originally announced September 2021.

    Comments: Accepted at EMNLP 2021

  13. arXiv:2109.01653  [pdf, other

    cs.CL cs.AI

    CREAK: A Dataset for Commonsense Reasoning over Entity Knowledge

    Authors: Yasumasa Onoe, Michael J. Q. Zhang, Eunsol Choi, Greg Durrett

    Abstract: Most benchmark datasets targeting commonsense reasoning focus on everyday scenarios: physical knowledge like knowing that you could fill a cup under a waterfall [Talmor et al., 2019], social knowledge like bum** into someone is awkward [Sap et al., 2019], and other generic situations. However, there is a rich space of commonsense inferences anchored to knowledge about specific entities: for exam… ▽ More

    Submitted 3 September, 2021; originally announced September 2021.

  14. arXiv:2006.06856  [pdf, other

    cs.LG cs.AI stat.ML

    BanditPAM: Almost Linear Time $k$-Medoids Clustering via Multi-Armed Bandits

    Authors: Mo Tiwari, Martin **ye Zhang, James Mayclin, Sebastian Thrun, Chris Piech, Ilan Shomorony

    Abstract: Clustering is a ubiquitous task in data science. Compared to the commonly used $k$-means clustering, $k$-medoids clustering requires the cluster centers to be actual data points and support arbitrary distance metrics, which permits greater interpretability and the clustering of structured objects. Current state-of-the-art $k$-medoids clustering algorithms, such as Partitioning Around Medoids (PAM)… ▽ More

    Submitted 6 December, 2020; v1 submitted 11 June, 2020; originally announced June 2020.

    Comments: 21 pages, NeurIPS 2020

  15. arXiv:1902.00197  [pdf, other

    stat.ME cs.IT q-bio.GN

    Adaptive Monte Carlo Multiple Testing via Multi-Armed Bandits

    Authors: Martin J. Zhang, James Zou, David Tse

    Abstract: Monte Carlo (MC) permutation test is considered the gold standard for statistical hypothesis testing, especially when standard parametric assumptions are not clear or likely to fail. However, in modern data science settings where a large number of hypothesis tests need to be performed simultaneously, it is rarely used due to its prohibitive computational cost. In genome-wide association studies, f… ▽ More

    Submitted 18 May, 2019; v1 submitted 1 February, 2019; originally announced February 2019.

  16. arXiv:1711.00817  [pdf, other

    stat.ML cs.DS cs.IT cs.LG

    Medoids in almost linear time via multi-armed bandits

    Authors: Vivek Bagaria, Govinda M. Kamath, Vasilis Ntranos, Martin J. Zhang, David Tse

    Abstract: Computing the medoid of a large number of points in high-dimensional space is an increasingly common operation in many data science problems. We present an algorithm Med-dit which uses O(n log n) distance evaluations to compute the medoid with high probability. Med-dit is based on a connection with the multi-armed bandit problem. We evaluate the performance of Med-dit empirically on the Netflix-pr… ▽ More

    Submitted 7 November, 2017; v1 submitted 2 November, 2017; originally announced November 2017.

  17. arXiv:1709.06716  [pdf, other

    stat.ML cs.LG

    Contrastive Principal Component Analysis

    Authors: Abubakar Abid, Martin J. Zhang, Vivek K. Bagaria, James Zou

    Abstract: We present a new technique called contrastive principal component analysis (cPCA) that is designed to discover low-dimensional structure that is unique to a dataset, or enriched in one dataset relative to other data. The technique is a generalization of standard PCA, for the setting where multiple datasets are available -- e.g. a treatment and a control group, or a mixed versus a homogeneous popul… ▽ More

    Submitted 21 November, 2017; v1 submitted 19 September, 2017; originally announced September 2017.

    Comments: main body is 10 pages, 9 figures