Skip to main content

Showing 1–13 of 13 results for author: Steyvers, M

.
  1. arXiv:2401.13835  [pdf, other

    cs.LG cs.AI cs.CL cs.HC

    The Calibration Gap between Model and Human Confidence in Large Language Models

    Authors: Mark Steyvers, Heliodoro Tejeda, Aakriti Kumar, Catarina Belem, Sheer Karny, Xinyue Hu, Lukas Mayer, Padhraic Smyth

    Abstract: For large language models (LLMs) to be trusted by humans they need to be well-calibrated in the sense that they can accurately assess and communicate how likely it is that their predictions are correct. Recent work has focused on the quality of internal LLM confidence assessments, but the question remains of how well LLMs can communicate this internal model confidence to human users. This paper ex… ▽ More

    Submitted 24 January, 2024; originally announced January 2024.

    Comments: 27 pages, 10 figures

  2. arXiv:2312.07679  [pdf, other

    cs.LG stat.ML

    Bayesian Online Learning for Consensus Prediction

    Authors: Sam Showalter, Alex Boyd, Padhraic Smyth, Mark Steyvers

    Abstract: Given a pre-trained classifier and multiple human experts, we investigate the task of online classification where model predictions are provided for free but querying humans incurs a cost. In this practical but under-explored setting, oracle ground truth is not available. Instead, the prediction target is defined as the consensus vote of all experts. Given that querying full consensus can be costl… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

  3. arXiv:2305.09064  [pdf, other

    cs.LG cs.AI cs.HC

    Capturing Humans' Mental Models of AI: An Item Response Theory Approach

    Authors: Markelle Kelly, Aakriti Kumar, Padhraic Smyth, Mark Steyvers

    Abstract: Improving our understanding of how humans perceive AI teammates is an important foundation for our general understanding of human-AI teams. Extending relevant work from cognitive science, we propose a framework based on item response theory for modeling these perceptions. We apply this framework to real-world experiments, in which each participant works alongside another person or an AI agent in a… ▽ More

    Submitted 15 May, 2023; originally announced May 2023.

    Comments: FAccT 2023

  4. arXiv:2301.11916  [pdf, other

    cs.CL cs.AI cs.LG

    Large Language Models Are Latent Variable Models: Explaining and Finding Good Demonstrations for In-Context Learning

    Authors: Xinyi Wang, Wanrong Zhu, Michael Saxon, Mark Steyvers, William Yang Wang

    Abstract: In recent years, pre-trained large language models (LLMs) have demonstrated remarkable efficiency in achieving an inference-time few-shot learning capability known as in-context learning. However, existing literature has highlighted the sensitivity of this capability to the selection of few-shot demonstrations. Current understandings of the underlying mechanisms by which this capability arises fro… ▽ More

    Submitted 12 February, 2024; v1 submitted 27 January, 2023; originally announced January 2023.

    Comments: code at: https://github.com/WANGXinyiLinda/concept-based-demonstration-selection Accepted to NeurIPS 2023

  5. arXiv:2109.14591  [pdf, other

    cs.LG stat.ML

    Combining Human Predictions with Model Probabilities via Confusion Matrices and Calibration

    Authors: Gavin Kerrigan, Padhraic Smyth, Mark Steyvers

    Abstract: An increasingly common use case for machine learning models is augmenting the abilities of human decision makers. For classification tasks where neither the human or model are perfectly accurate, a key step in obtaining high performance is combining their individual predictions in a manner that leverages their relative strengths. In this work, we develop a set of algorithms that combine the probab… ▽ More

    Submitted 1 October, 2021; v1 submitted 29 September, 2021; originally announced September 2021.

    Comments: NeurIPS 2021

  6. arXiv:2010.09851  [pdf, other

    stat.ML cs.AI cs.LG

    Can I Trust My Fairness Metric? Assessing Fairness with Unlabeled Data and Bayesian Inference

    Authors: Disi Ji, Padhraic Smyth, Mark Steyvers

    Abstract: We investigate the problem of reliably assessing group fairness when labeled examples are few but unlabeled examples are plentiful. We propose a general Bayesian framework that can augment labeled data with unlabeled data to produce more accurate and lower-variance estimates compared to methods based on labeled data alone. Our approach estimates calibrated scores for unlabeled examples in each gro… ▽ More

    Submitted 19 October, 2020; originally announced October 2020.

    Comments: 27 pages

  7. arXiv:2002.06532  [pdf, other

    stat.ML cs.LG

    Active Bayesian Assessment for Black-Box Classifiers

    Authors: Disi Ji, Robert L. Logan IV, Padhraic Smyth, Mark Steyvers

    Abstract: Recent advances in machine learning have led to increased deployment of black-box classifiers across a wide variety of applications. In many such situations there is a critical need to both reliably assess the performance of these pre-trained models and to perform this assessment in a label-efficient manner (given that labels may be scarce and costly to collect). In this paper, we introduce an act… ▽ More

    Submitted 15 March, 2021; v1 submitted 16 February, 2020; originally announced February 2020.

  8. arXiv:1808.02157  [pdf, other

    q-bio.NC

    Experimental Design Modulates Variance in BOLD Activation: The Variance Design General Linear Model

    Authors: Garren Gaut, Xiangrui Li, Zhong-Lin Lu, Mark Steyvers

    Abstract: Typical fMRI studies have focused on either the mean trend in the blood-oxygen-level-dependent (BOLD) time course or functional connectivity (FC). However, other statistics of the neuroimaging data may contain important information. Despite studies showing links between the variance in the BOLD time series (BV) and age and cognitive performance, a formal framework for testing these effects has not… ▽ More

    Submitted 6 August, 2018; originally announced August 2018.

    Comments: 18 pages, 7 figures

  9. arXiv:1807.04745  [pdf, other

    q-bio.NC

    Predicting Task and Subject Differences with Functional Connectivity and BOLD Variability

    Authors: Garren Gaut, Xiangrui Li, Brandon Turner, William A. Cunningham, Zhong-Lin Lu, Mark Steyvers

    Abstract: Previous research has found that functional connectivity (FC) can accurately predict the identity of a subject performing a task and the type of task being performed. We replicate these results using a large dataset collected at the OSU Center for Cognitive and Behavioral Brain Imaging. We also introduce a novel perspective on task and subject identity prediction: BOLD Variability (BV). Conceptual… ▽ More

    Submitted 12 July, 2018; originally announced July 2018.

  10. arXiv:1207.4169  [pdf

    cs.IR cs.LG stat.ML

    The Author-Topic Model for Authors and Documents

    Authors: Michal Rosen-Zvi, Thomas Griffiths, Mark Steyvers, Padhraic Smyth

    Abstract: We introduce the author-topic model, a generative model for documents that extends Latent Dirichlet Allocation (LDA; Blei, Ng, & Jordan, 2003) to include authorship information. Each author is associated with a multinomial distribution over topics and each topic is associated with a multinomial distribution over words. A document with multiple authors is modeled as a distribution over topics that… ▽ More

    Submitted 11 July, 2012; originally announced July 2012.

    Comments: Appears in Proceedings of the Twentieth Conference on Uncertainty in Artificial Intelligence (UAI2004)

    Report number: UAI-P-2004-PG-487-494

  11. arXiv:1107.2462  [pdf, other

    stat.ML cs.LG

    Statistical Topic Models for Multi-Label Document Classification

    Authors: Timothy N. Rubin, America Chambers, Padhraic Smyth, Mark Steyvers

    Abstract: Machine learning approaches to multi-label document classification have to date largely relied on discriminative modeling techniques such as support vector machines. A drawback of these approaches is that performance rapidly drops off as the total number of labels and the number of labels per document increase. This problem is amplified when the label frequencies exhibit the type of highly skewed… ▽ More

    Submitted 9 November, 2011; v1 submitted 13 July, 2011; originally announced July 2011.

    Comments: 44 Pages (Including Appendices). To be published in: The Machine Learning Journal, special issue on Learning from Multi-Label Data. Version 2 corrects some typos, updates some of the notation used in the paper for clarification of some equations, and incorporates several relatively minor changes to the text throughout the paper

  12. arXiv:0808.0973  [pdf, ps, other

    cs.AI cs.IR

    Text Modeling using Unsupervised Topic Models and Concept Hierarchies

    Authors: Chaitanya Chemudugunta, Padhraic Smyth, Mark Steyvers

    Abstract: Statistical topic models provide a general data-driven framework for automated discovery of high-level knowledge from large collections of text documents. While topic models can potentially discover a broad range of themes in a data set, the interpretability of the learned topics is not always ideal. Human-defined concepts, on the other hand, tend to be semantically richer due to careful selecti… ▽ More

    Submitted 7 August, 2008; originally announced August 2008.

  13. arXiv:cond-mat/0110012  [pdf

    cond-mat.soft cond-mat.dis-nn

    The large-scale structure of semantic networks: statistical analyses and a model for semantic growth

    Authors: Mark Steyvers, Joshua B. Tenenbaum

    Abstract: We present statistical analyses of the large-scale structure of three types of semantic networks: word associations, WordNet, and Roget's thesaurus. We show that they have a small-world structure, characterized by sparse connectivity, short average path-lengths between words, and strong local clustering. In addition, the distributions of the number of connections follow power laws that indicate… ▽ More

    Submitted 1 October, 2001; originally announced October 2001.

    Comments: 25 pages, 9 figures, submitted paper to Cognitive Science