Skip to main content

Showing 1–50 of 104 results for author: Wallace, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.00211  [pdf, other

    cs.CL

    Detection and Measurement of Syntactic Templates in Generated Text

    Authors: Chantal Shaib, Yanai Elazar, Junyi Jessy Li, Byron C. Wallace

    Abstract: Recent work on evaluating the diversity of text generated by LLMs has focused on word-level features. Here we offer an analysis of syntactic features to characterize general repetition in models, beyond frequent n-grams. Specifically, we define syntactic templates and show that models tend to produce templated text in downstream tasks at a higher rate than what is found in human-reference texts. W… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

  2. arXiv:2406.20086  [pdf, other

    cs.CL cs.LG

    Token Erasure as a Footprint of Implicit Vocabulary Items in LLMs

    Authors: Sheridan Feucht, David Atkinson, Byron Wallace, David Bau

    Abstract: LLMs process text as sequences of tokens that roughly correspond to words, where less common words are represented by multiple tokens. However, individual tokens are often semantically unrelated to the meanings of the words/concepts they comprise. For example, Llama-2-7b's tokenizer splits the word "northeastern" into the tokens ['_n', 'ort', 'he', 'astern'], none of which correspond to semantical… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: 13 pages, 14 figures. Code and data at https://footprints.baulab.info/

    ACM Class: I.2.7

  3. arXiv:2406.14511  [pdf, other

    cs.CL

    Investigating Mysteries of CoT-Augmented Distillation

    Authors: Somin Wadhwa, Silvio Amir, Byron C. Wallace

    Abstract: Eliciting "chain of thought" (CoT) rationales -- sequences of token that convey a "reasoning" process -- has been shown to consistently improve LLM performance on tasks like question answering. More recent efforts have shown that such rationales can also be used for model distillation: Including CoT sequences (elicited from a large "teacher" model) in addition to target labels when fine-tuning a s… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Draft; under review

  4. arXiv:2406.09330  [pdf, other

    cs.CL

    Learning from Natural Language Explanations for Generalizable Entity Matching

    Authors: Somin Wadhwa, Adit Krishnan, Runhui Wang, Byron C. Wallace, Chris Kong

    Abstract: Entity matching is the task of linking records from different sources that refer to the same real-world entity. Past work has primarily treated entity linking as a standard supervised learning problem. However, supervised entity matching models often do not generalize well to new data, and collecting exhaustive labeled training data is often cost prohibitive. Further, recent efforts have adopted L… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  5. arXiv:2405.12367  [pdf, other

    eess.IV cs.CV

    Large-Scale Multi-Center CT and MRI Segmentation of Pancreas with Deep Learning

    Authors: Zheyuan Zhang, Elif Keles, Gorkem Durak, Yavuz Taktak, Onkar Susladkar, Vandan Gorade, Debesh Jha, Asli C. Ormeci, Alpay Medetalibeyoglu, Lanhong Yao, Bin Wang, Ilkin Sevgi Isler, Linkai Peng, Hongyi Pan, Camila Lopes Vendrami, Amir Bourhani, Yury Velichko, Boqing Gong, Concetto Spampinato, Ayis Pyrros, Pallavi Tiwari, Derk C. F. Klatte, Megan Engels, Sanne Hoogenboom, Candice W. Bolan , et al. (13 additional authors not shown)

    Abstract: Automated volumetric segmentation of the pancreas on cross-sectional imaging is needed for diagnosis and follow-up of pancreatic diseases. While CT-based pancreatic segmentation is more established, MRI-based segmentation methods are understudied, largely due to a lack of publicly available datasets, benchmarking research efforts, and domain-specific deep learning methods. In this retrospective st… ▽ More

    Submitted 25 May, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

    Comments: under review version

  6. arXiv:2405.01686  [pdf, other

    cs.CL cs.AI

    Automatically Extracting Numerical Results from Randomized Controlled Trials with Large Language Models

    Authors: Hye Sun Yun, David Pogrebitskiy, Iain J. Marshall, Byron C. Wallace

    Abstract: Meta-analyses statistically aggregate the findings of different randomized controlled trials (RCTs) to assess treatment effectiveness. Because this yields robust estimates of treatment effectiveness, results from meta-analyses are considered the strongest form of evidence. However, rigorous evidence syntheses are time-consuming and labor-intensive, requiring manual extraction of data from individu… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: 24 pages, 7 figures, 6 tables

  7. arXiv:2404.00152  [pdf, other

    cs.CL

    On-the-fly Definition Augmentation of LLMs for Biomedical NER

    Authors: Monica Munnangi, Sergey Feldman, Byron C Wallace, Silvio Amir, Tom Hope, Aakanksha Naik

    Abstract: Despite their general capabilities, LLMs still struggle on biomedical NER tasks, which are difficult due to the presence of specialized terminology and lack of training data. In this work we set out to improve LLM performance on biomedical NER in limited data settings via a new knowledge augmentation approach which incorporates definitions of relevant concepts on-the-fly. During this process, to p… ▽ More

    Submitted 23 April, 2024; v1 submitted 29 March, 2024; originally announced April 2024.

    Comments: To appear at NAACL 2024 (Main)

  8. arXiv:2403.00553  [pdf, other

    cs.CL

    Standardizing the Measurement of Text Diversity: A Tool and a Comparative Analysis of Scores

    Authors: Chantal Shaib, Joe Barrow, Jiuding Sun, Alexa F. Siu, Byron C. Wallace, Ani Nenkova

    Abstract: The diversity across outputs generated by large language models shapes the perception of their quality and utility. Prompt leaks, templated answer structure, and canned responses across different interactions are readily noticed by people, but there is no standard score to measure this aspect of model behavior. In this work we empirically investigate diversity scores on English texts. We find that… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

    Comments: Preprint

  9. arXiv:2402.18756  [pdf, other

    cs.CL

    How Much Annotation is Needed to Compare Summarization Models?

    Authors: Chantal Shaib, Joe Barrow, Alexa F. Siu, Byron C. Wallace, Ani Nenkova

    Abstract: Modern instruction-tuned models have become highly capable in text generation tasks such as summarization, and are expected to be released at a steady pace. In practice one may now wish to choose confidently, but with minimal effort, the best performing summarization model when applied to a new domain or purpose. In this work, we empirically investigate the test sample size necessary to select a p… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

    Comments: Preprint

  10. arXiv:2402.15663  [pdf, other

    cs.CL

    Leveraging ChatGPT in Pharmacovigilance Event Extraction: An Empirical Study

    Authors: Zhaoyue Sun, Gabriele Pergola, Byron C. Wallace, Yulan He

    Abstract: With the advent of large language models (LLMs), there has been growing interest in exploring their potential for medical applications. This research aims to investigate the ability of LLMs, specifically ChatGPT, in the context of pharmacovigilance event extraction, of which the main goal is to identify and extract adverse events or potential therapeutic events from textual medical sources. We con… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

    Comments: 14 pages, 2 figures, accepted by EACL 2024

  11. arXiv:2402.12566  [pdf, other

    cs.CL cs.LG

    GenAudit: Fixing Factual Errors in Language Model Outputs with Evidence

    Authors: Kundan Krishna, Sanjana Ramprasad, Prakhar Gupta, Byron C. Wallace, Zachary C. Lipton, Jeffrey P. Bigham

    Abstract: LLMs can generate factually incorrect statements even when provided access to reference documents. Such errors can be dangerous in high-stakes applications (e.g., document-grounded QA for healthcare or finance). We present GenAudit -- a tool intended to assist fact-checking LLM responses for document-grounded tasks. GenAudit suggests edits to the LLM response by revising or removing claims that ar… ▽ More

    Submitted 16 March, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

    Comments: Code and models available at https://genaudit.org

  12. arXiv:2402.11456  [pdf, other

    cs.CL

    FactPICO: Factuality Evaluation for Plain Language Summarization of Medical Evidence

    Authors: Sebastian Antony Joseph, Lily Chen, Jan Trienes, Hannah Louisa Göke, Monika Coers, Wei Xu, Byron C Wallace, Junyi Jessy Li

    Abstract: Plain language summarization with LLMs can be useful for improving textual accessibility of technical content. But how factual are these summaries in a high-stakes domain like medicine? This paper presents FactPICO, a factuality benchmark for plain language summarization of medical texts describing randomized controlled trials (RCTs), which are the basis of evidence-based medicine and can directly… ▽ More

    Submitted 4 June, 2024; v1 submitted 17 February, 2024; originally announced February 2024.

    Comments: Preprint has been updated to match the final revision for ACL 2024

  13. arXiv:2402.10109  [pdf, other

    cs.AI cs.CL cs.LG

    Towards Reducing Diagnostic Errors with Interpretable Risk Prediction

    Authors: Denis Jered McInerney, William Dickinson, Lucy C. Flynn, Andrea C. Young, Geoffrey S. Young, Jan-Willem van de Meent, Byron C. Wallace

    Abstract: Many diagnostic errors occur because clinicians cannot easily access relevant information in patient Electronic Health Records (EHRs). In this work we propose a method to use LLMs to identify pieces of evidence in patient EHR data that indicate increased or decreased risk of specific diagnoses; our ultimate aim is to increase access to evidence and reduce diagnostic errors. In particular, we propo… ▽ More

    Submitted 19 March, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

  14. arXiv:2402.03509  [pdf, other

    cs.CL cs.AI cs.LG

    Evaluating the Factuality of Zero-shot Summarizers Across Varied Domains

    Authors: Sanjana Ramprasad, Kundan Krishna, Zachary C Lipton, Byron C Wallace

    Abstract: Recent work has shown that large language models (LLMs) are capable of generating summaries zero-shot (i.e., without explicit supervision) that, under human assessment, are often comparable or even preferred to manually composed reference summaries. However, this prior work has focussed almost exclusively on evaluating news article summarization. How do zero-shot summarizers perform in other (pote… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

  15. arXiv:2402.01700  [pdf

    cs.CL cs.AI

    Question answering systems for health professionals at the point of care -- a systematic review

    Authors: Gregory Kell, Angus Roberts, Serge Umansky, Linglong Qian, Davide Ferrari, Frank Soboczenski, Byron Wallace, Nikhil Patel, Iain J Marshall

    Abstract: Objective: Question answering (QA) systems have the potential to improve the quality of clinical care by providing health professionals with the latest and most relevant evidence. However, QA systems have not been widely adopted. This systematic review aims to characterize current medical QA systems, assess their suitability for healthcare, and identify areas of improvement. Materials and method… ▽ More

    Submitted 24 January, 2024; originally announced February 2024.

    Comments: Accepted to the Journal of the American Medical Informatics Association (JAMIA)

  16. arXiv:2401.16475  [pdf, other

    cs.CL

    InfoLossQA: Characterizing and Recovering Information Loss in Text Simplification

    Authors: Jan Trienes, Sebastian Joseph, Jörg Schlötterer, Christin Seifert, Kyle Lo, Wei Xu, Byron C. Wallace, Junyi Jessy Li

    Abstract: Text simplification aims to make technical texts more accessible to laypeople but often results in deletion of information and vagueness. This work proposes InfoLossQA, a framework to characterize and recover simplification-induced information loss in form of question-and-answer (QA) pairs. Building on the theory of Question Under Discussion, the QA pairs are designed to help readers deepen their… ▽ More

    Submitted 4 June, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

    Comments: Accepted at ACL 2024 (main conference)

  17. arXiv:2311.13978  [pdf, other

    cs.LG eess.IV

    MedISure: Towards Assuring Machine Learning-based Medical Image Classifiers using Mixup Boundary Analysis

    Authors: Adam Byfield, William Poulett, Ben Wallace, Anusha Jose, Shatakshi Tyagi, Smita Shembekar, Adnan Qayyum, Junaid Qadir, Muhammad Bilal

    Abstract: Machine learning (ML) models are becoming integral in healthcare technologies, presenting a critical need for formal assurance to validate their safety, fairness, robustness, and trustworthiness. These models are inherently prone to errors, potentially posing serious risks to patient health and could even cause irreparable harm. Traditional software assurance techniques rely on fixed code and do n… ▽ More

    Submitted 23 November, 2023; originally announced November 2023.

  18. arXiv:2311.12908  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    Diffusion Model Alignment Using Direct Preference Optimization

    Authors: Bram Wallace, Meihua Dang, Rafael Rafailov, Linqi Zhou, Aaron Lou, Senthil Purushwalkam, Stefano Ermon, Caiming Xiong, Shafiq Joty, Nikhil Naik

    Abstract: Large language models (LLMs) are fine-tuned using human comparison data with Reinforcement Learning from Human Feedback (RLHF) methods to make them better aligned with users' preferences. In contrast to LLMs, human preference learning has not been widely explored in text-to-image diffusion models; the best existing approach is to fine-tune a pretrained model using carefully curated high quality im… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

  19. arXiv:2311.11211  [pdf

    cs.AI

    Leveraging Generative AI for Clinical Evidence Summarization Needs to Ensure Trustworthiness

    Authors: Gongbo Zhang, Qiao **, Denis Jered McInerney, Yong Chen, Fei Wang, Curtis L. Cole, Qian Yang, Yanshan Wang, Bradley A. Malin, Mor Peleg, Byron C. Wallace, Zhiyong Lu, Chunhua Weng, Yifan Peng

    Abstract: Evidence-based medicine promises to improve the quality of healthcare by empowering medical decisions and practices with the best available evidence. The rapid growth of medical evidence, which can be obtained from various sources, poses a challenge in collecting, appraising, and synthesizing the evidential information. Recent advancements in generative AI, exemplified by large language models, ho… ▽ More

    Submitted 31 March, 2024; v1 submitted 18 November, 2023; originally announced November 2023.

  20. Future Lens: Anticipating Subsequent Tokens from a Single Hidden State

    Authors: Koyena Pal, Jiuding Sun, Andrew Yuan, Byron C. Wallace, David Bau

    Abstract: We conjecture that hidden state vectors corresponding to individual input tokens encode information sufficient to accurately predict several tokens ahead. More concretely, in this paper we ask: Given a hidden (internal) representation of a single token at position $t$ in an input, can we reliably anticipate the tokens that will appear at positions $\geq t + 2$? To test this, we measure linear appr… ▽ More

    Submitted 8 November, 2023; originally announced November 2023.

    Comments: Accepted at CoNLL 2023

  21. arXiv:2310.15213  [pdf, other

    cs.CL cs.LG

    Function Vectors in Large Language Models

    Authors: Eric Todd, Millicent L. Li, Arnab Sen Sharma, Aaron Mueller, Byron C. Wallace, David Bau

    Abstract: We report the presence of a simple neural mechanism that represents an input-output function as a vector within autoregressive transformer language models (LMs). Using causal mediation analysis on a diverse range of in-context-learning (ICL) tasks, we find that a small number attention heads transport a compact representation of the demonstrated task, which we call a function vector (FV). FVs are… ▽ More

    Submitted 25 February, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

    Comments: ICLR 2024. 52 pages, 30 figures, 23 tables. Code and data at https://functions.baulab.info

  22. arXiv:2309.04550  [pdf, other

    cs.CL

    Retrieving Evidence from EHRs with LLMs: Possibilities and Challenges

    Authors: Hiba Ahsan, Denis Jered McInerney, Jisoo Kim, Christopher Potter, Geoffrey Young, Silvio Amir, Byron C. Wallace

    Abstract: Unstructured data in Electronic Health Records (EHRs) often contains critical information -- complementary to imaging -- that could inform radiologists' diagnoses. But the large volume of notes often associated with patients together with time constraints renders manually identifying relevant evidence practically infeasible. In this work we propose and evaluate a zero-shot strategy for using LLMs… ▽ More

    Submitted 10 June, 2024; v1 submitted 8 September, 2023; originally announced September 2023.

  23. arXiv:2307.08920  [pdf, other

    eess.SY cs.AI cs.LG

    Continuous-Time Reinforcement Learning: New Design Algorithms with Theoretical Insights and Performance Guarantees

    Authors: Brent A. Wallace, Jennie Si

    Abstract: Continuous-time nonlinear optimal control problems hold great promise in real-world applications. After decades of development, reinforcement learning (RL) has achieved some of the greatest successes as a general nonlinear control design method. However, a recent comprehensive analysis of state-of-the-art continuous-time RL (CT-RL) methods, namely, adaptive dynamic programming (ADP)-based CT-RL al… ▽ More

    Submitted 17 July, 2023; originally announced July 2023.

  24. arXiv:2306.11270  [pdf, other

    cs.CL cs.LG

    Evaluating the Zero-shot Robustness of Instruction-tuned Language Models

    Authors: Jiuding Sun, Chantal Shaib, Byron C. Wallace

    Abstract: Instruction fine-tuning has recently emerged as a promising approach for improving the zero-shot capabilities of Large Language Models (LLMs) on new tasks. This technique has shown particular strength in improving the performance of modestly sized LLMs, sometimes inducing performance competitive with much larger model variants. In this paper we ask two questions: (1) How sensitive are instruction-… ▽ More

    Submitted 8 July, 2023; v1 submitted 19 June, 2023; originally announced June 2023.

  25. arXiv:2305.14296  [pdf, other

    cs.CL cs.LG

    USB: A Unified Summarization Benchmark Across Tasks and Domains

    Authors: Kundan Krishna, Prakhar Gupta, Sanjana Ramprasad, Byron C. Wallace, Jeffrey P. Bigham, Zachary C. Lipton

    Abstract: While the NLP community has produced numerous summarization benchmarks, none provide the rich annotations required to simultaneously address many important problems related to control and reliability. We introduce a Wikipedia-derived benchmark, complemented by a rich set of crowd-sourced annotations, that supports $8$ interrelated tasks: (i) extractive summarization; (ii) abstractive summarization… ▽ More

    Submitted 4 December, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: EMNLP Findings 2023 Camera Ready

  26. arXiv:2305.13693  [pdf, other

    cs.CL

    Automated Metrics for Medical Multi-Document Summarization Disagree with Human Evaluations

    Authors: Lucy Lu Wang, Yulia Otmakhova, Jay DeYoung, Thinh Hung Truong, Bailey E. Kuehl, Erin Bransom, Byron C. Wallace

    Abstract: Evaluating multi-document summarization (MDS) quality is difficult. This is especially true in the case of MDS for biomedical literature reviews, where models must synthesize contradicting evidence reported across different documents. Prior work has shown that rather than performing the task, models may exploit shortcuts that are difficult to detect using standard n-gram similarity metrics such as… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

    Comments: ACL 2023; Github: https://github.com/allenai/mslr-annotated-dataset

  27. arXiv:2305.12532  [pdf, other

    cs.CL

    Multilingual Simplification of Medical Texts

    Authors: Sebastian Joseph, Kathryn Kazanas, Keziah Reina, Vishnesh J. Ramanathan, Wei Xu, Byron C. Wallace, Junyi Jessy Li

    Abstract: Automated text simplification aims to produce simple versions of complex texts. This task is especially useful in the medical domain, where the latest medical findings are typically communicated via complex and technical articles. This creates barriers for laypeople seeking access to up-to-date medical findings, consequently impeding progress on health literacy. Most existing work on medical text… ▽ More

    Submitted 18 October, 2023; v1 submitted 21 May, 2023; originally announced May 2023.

    Comments: This version will be in EMNLP 2023 main

  28. arXiv:2305.11828  [pdf, other

    cs.CL cs.AI cs.HC

    Appraising the Potential Uses and Harms of LLMs for Medical Systematic Reviews

    Authors: Hye Sun Yun, Iain J. Marshall, Thomas A. Trikalinos, Byron C. Wallace

    Abstract: Medical systematic reviews play a vital role in healthcare decision making and policy. However, their production is time-consuming, limiting the availability of high-quality and up-to-date evidence summaries. Recent advancements in large language models (LLMs) offer the potential to automatically generate literature reviews on demand, addressing this issue. However, LLMs sometimes generate inaccur… ▽ More

    Submitted 18 October, 2023; v1 submitted 19 May, 2023; originally announced May 2023.

    Comments: 18 pages, 2 figures, 8 tables. Accepted as an EMNLP 2023 main paper

  29. arXiv:2305.06299  [pdf, other

    cs.CL

    Summarizing, Simplifying, and Synthesizing Medical Evidence Using GPT-3 (with Varying Success)

    Authors: Chantal Shaib, Millicent L. Li, Sebastian Joseph, Iain J. Marshall, Junyi Jessy Li, Byron C. Wallace

    Abstract: Large language models, particularly GPT-3, are able to produce high quality summaries of general domain news articles in few- and zero-shot settings. However, it is unclear if such models are similarly capable in more specialized, high-stakes domains such as biomedicine. In this paper, we enlist domain experts (individuals with medical training) to evaluate summaries of biomedical articles generat… ▽ More

    Submitted 11 May, 2023; v1 submitted 10 May, 2023; originally announced May 2023.

    Comments: Accepted short paper to ACL 2023

  30. arXiv:2305.05003  [pdf, other

    cs.CL

    Revisiting Relation Extraction in the era of Large Language Models

    Authors: Somin Wadhwa, Silvio Amir, Byron C. Wallace

    Abstract: Relation extraction (RE) is the core NLP task of inferring semantic relationships between entities from text. Standard supervised RE techniques entail training modules to tag tokens comprising entity spans and then predict the relationship between them. Recent work has instead treated the problem as a \emph{sequence-to-sequence} task, linearizing relations between entities as target strings to be… ▽ More

    Submitted 8 May, 2023; originally announced May 2023.

    Comments: Accepted to ACL 2023

  31. arXiv:2305.03642  [pdf, other

    cs.CL

    Jointly Extracting Interventions, Outcomes, and Findings from RCT Reports with LLMs

    Authors: Somin Wadhwa, Jay DeYoung, Benjamin Nye, Silvio Amir, Byron C. Wallace

    Abstract: Results from Randomized Controlled Trials (RCTs) establish the comparative effectiveness of interventions, and are in turn critical inputs for evidence-based care. However, results from RCTs are presented in (often unstructured) natural language articles describing the design, execution, and outcomes of trials; clinicians must manually extract findings pertaining to interventions and outcomes of i… ▽ More

    Submitted 17 July, 2023; v1 submitted 5 May, 2023; originally announced May 2023.

    Comments: Accepted to MLHC 2023

  32. arXiv:2303.13703  [pdf, other

    cs.CV cs.AI cs.LG

    End-to-End Diffusion Latent Optimization Improves Classifier Guidance

    Authors: Bram Wallace, Akash Gokul, Stefano Ermon, Nikhil Naik

    Abstract: Classifier guidance -- using the gradients of an image classifier to steer the generations of a diffusion model -- has the potential to dramatically expand the creative control over image generation and editing. However, currently classifier guidance requires either training new noise-aware models to obtain accurate gradients or using a one-step denoising approximation of the final generation, whi… ▽ More

    Submitted 31 May, 2023; v1 submitted 23 March, 2023; originally announced March 2023.

  33. arXiv:2303.05392  [pdf, other

    cs.CL cs.IR cs.LG

    Automatically Summarizing Evidence from Clinical Trials: A Prototype Highlighting Current Challenges

    Authors: Sanjana Ramprasad, Denis Jered McInerney, Iain J. Marshal, Byron C. Wallace

    Abstract: We present TrialsSummarizer, a system that aims to automatically summarize evidence presented in the set of randomized controlled trials most relevant to a given query. Building on prior work, the system retrieves trial publications matching a query specifying a combination of condition, intervention(s), and outcome(s), and ranks these according to sample size and estimated study quality. The top-… ▽ More

    Submitted 7 March, 2023; originally announced March 2023.

  34. arXiv:2302.12343  [pdf, other

    cs.CL cs.AI cs.LG

    CHiLL: Zero-shot Custom Interpretable Feature Extraction from Clinical Notes with Large Language Models

    Authors: Denis Jered McInerney, Geoffrey Young, Jan-Willem van de Meent, Byron C. Wallace

    Abstract: We propose CHiLL (Crafting High-Level Latents), an approach for natural-language specification of features for linear models. CHiLL prompts LLMs with expert-crafted queries to generate interpretable features from health records. The resulting noisy labels are then used to train a simple linear classifier. Generating features based on queries to an LLM can empower physicians to use their domain exp… ▽ More

    Submitted 19 October, 2023; v1 submitted 23 February, 2023; originally announced February 2023.

    Comments: To be published at EMNLP Findings 2023

  35. arXiv:2302.05574  [pdf, other

    cs.CL

    NapSS: Paragraph-level Medical Text Simplification via Narrative Prompting and Sentence-matching Summarization

    Authors: Junru Lu, Jiazheng Li, Byron C. Wallace, Yulan He, Gabriele Pergola

    Abstract: Accessing medical literature is difficult for laypeople as the content is written for specialists and contains medical jargon. Automated text simplification methods offer a potential means to address this issue. In this work, we propose a summarize-then-simplify two-stage strategy, which we call NapSS, identifying the relevant content to simplify while ensuring that the original narrative flow is… ▽ More

    Submitted 10 February, 2023; originally announced February 2023.

    Comments: Findings of EACL 2023

  36. arXiv:2302.02169  [pdf, other

    cs.LG cs.AI cs.CL

    How Many and Which Training Points Would Need to be Removed to Flip this Prediction?

    Authors: **ghan Yang, Sarthak Jain, Byron C. Wallace

    Abstract: We consider the problem of identifying a minimal subset of training data $\mathcal{S}_t$ such that if the instances comprising $\mathcal{S}_t$ had been removed prior to training, the categorization of a given test point $x_t$ would have been different. Identifying such a set may be of interest for a few reasons. First, the cardinality of $\mathcal{S}_t$ provides a measure of robustness (if… ▽ More

    Submitted 8 February, 2023; v1 submitted 4 February, 2023; originally announced February 2023.

    Comments: Accepted to EACL 2023

  37. arXiv:2301.13844  [pdf, other

    cs.CL

    Do Multi-Document Summarization Models Synthesize?

    Authors: Jay DeYoung, Stephanie C. Martinez, Iain J. Marshall, Byron C. Wallace

    Abstract: Multi-document summarization entails producing concise synopses of collections of inputs. For some applications, the synopsis should accurately \emph{synthesize} inputs with respect to a key property or aspect. For example, a synopsis of film reviews all written about a particular movie should reflect the average critic consensus. As a more consequential example, consider narrative summaries that… ▽ More

    Submitted 31 January, 2023; originally announced January 2023.

    Comments: 22 Pages, 13 Figures, 22 Tables. ACL Formatted paper; expanded version of rejected ICLR submisssion https://openreview.net/forum?id=1PTeB4MWCfU Paper de-anonymized ahead of ICLR de-anonymization due to ACL policies/additional conference submission

  38. arXiv:2212.01641  [pdf, other

    cs.CL cs.LG

    Intermediate Entity-based Sparse Interpretable Representation Learning

    Authors: Diego Garcia-Olano, Yasumasa Onoe, Joydeep Ghosh, Byron C. Wallace

    Abstract: Interpretable entity representations (IERs) are sparse embeddings that are "human-readable" in that dimensions correspond to fine-grained entity types and values are predicted probabilities that a given entity is of the corresponding type. These methods perform well in zero-shot and low supervision settings. Compared to standard dense neural embeddings, such interpretable representations may permi… ▽ More

    Submitted 3 December, 2022; originally announced December 2022.

    Comments: Accepted into BlackBox NLP Workshop at EMNLP 2022

  39. arXiv:2211.12446  [pdf, other

    cs.CV cs.AI cs.LG

    EDICT: Exact Diffusion Inversion via Coupled Transformations

    Authors: Bram Wallace, Akash Gokul, Nikhil Naik

    Abstract: Finding an initial noise vector that produces an input image when fed into the diffusion process (known as inversion) is an important problem in denoising diffusion models (DDMs), with applications for real image editing. The state-of-the-art approach for real image editing with inversion uses denoising diffusion implicit models (DDIMs) to deterministically noise the image to the intermediate stat… ▽ More

    Submitted 22 December, 2022; v1 submitted 22 November, 2022; originally announced November 2022.

    Comments: 24 pages, 22 figures. Code now available

  40. arXiv:2210.14177  [pdf, other

    cs.CL cs.AI cs.LG stat.ML

    Influence Functions for Sequence Tagging Models

    Authors: Sarthak Jain, Varun Manjunatha, Byron C. Wallace, Ani Nenkova

    Abstract: Many language tasks (e.g., Named Entity Recognition, Part-of-Speech tagging, and Semantic Role Labeling) are naturally framed as sequence tagging problems. However, there has been comparatively little work on interpretability methods for sequence tagging models. In this paper, we extend influence functions - which aim to trace predictions back to the training points that informed them - to sequenc… ▽ More

    Submitted 25 October, 2022; originally announced October 2022.

    Comments: Accepted to Findings of EMNLP 2022

  41. arXiv:2210.12560  [pdf, other

    cs.CL

    PHEE: A Dataset for Pharmacovigilance Event Extraction from Text

    Authors: Zhaoyue Sun, Jiazheng Li, Gabriele Pergola, Byron C. Wallace, Bino John, Nigel Greene, Joseph Kim, Yulan He

    Abstract: The primary goal of drug safety researchers and regulators is to promptly identify adverse drug reactions. Doing so may in turn prevent or reduce the harm to patients and ultimately improve public health. Evaluating and monitoring drug safety (i.e., pharmacovigilance) involves analyzing an ever growing collection of spontaneous reports from health professionals, physicians, and pharmacists, and in… ▽ More

    Submitted 22 October, 2022; originally announced October 2022.

    Comments: 17 pages, 3 figures, EMNLP2022 accepted

  42. arXiv:2210.09291  [pdf, other

    cs.HC

    Embodying the Glitch: Perspectives on Generative AI in Dance Practice

    Authors: Benedikte Wallace, Charles P. Martin

    Abstract: What role does the break from realism play in the potential for generative artificial intelligence as a creative tool? Through exploration of glitch, we examine the prospective value of these artefacts in creative practice. This paper describes findings from an exploration of AI-generated "mistakes" when using movement produced by a generative deep learning model as an inspiration source in dance… ▽ More

    Submitted 5 October, 2022; originally announced October 2022.

  43. arXiv:2210.08145  [pdf, other

    cs.CL

    Self-Repetition in Abstractive Neural Summarizers

    Authors: Nikita Salkar, Thomas Trikalinos, Byron C. Wallace, Ani Nenkova

    Abstract: We provide a quantitative and qualitative analysis of self-repetition in the output of neural summarizers. We measure self-repetition as the number of n-grams of length four or longer that appear in multiple outputs of the same system. We analyze the behavior of three popular architectures (BART, T5, and Pegasus), fine-tuned on five datasets. In a regression analysis, we find that the three archit… ▽ More

    Submitted 14 October, 2022; originally announced October 2022.

  44. arXiv:2210.06565  [pdf, other

    cs.LG cs.AI cs.CV eess.IV

    That's the Wrong Lung! Evaluating and Improving the Interpretability of Unsupervised Multimodal Encoders for Medical Data

    Authors: Denis Jered McInerney, Geoffrey Young, Jan-Willem van de Meent, Byron C. Wallace

    Abstract: Pretraining multimodal models on Electronic Health Records (EHRs) provides a means of learning representations that can transfer to downstream tasks with minimal supervision. Recent multimodal models induce soft local alignments between image regions and sentences. This is of particular interest in the medical domain, where alignments might highlight regions in an image relevant to specific phenom… ▽ More

    Submitted 22 October, 2022; v1 submitted 12 October, 2022; originally announced October 2022.

    Journal ref: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

  45. arXiv:2210.06331  [pdf, other

    cs.CL

    RedHOT: A Corpus of Annotated Medical Questions, Experiences, and Claims on Social Media

    Authors: Somin Wadhwa, Vivek Khetan, Silvio Amir, Byron Wallace

    Abstract: We present Reddit Health Online Talk (RedHOT), a corpus of 22,000 richly annotated social media posts from Reddit spanning 24 health conditions. Annotations include demarcations of spans corresponding to medical claims, personal experiences, and questions. We collect additional granular annotations on identified claims. Specifically, we mark snippets that describe patient Populations, Intervention… ▽ More

    Submitted 7 February, 2023; v1 submitted 12 October, 2022; originally announced October 2022.

    Comments: Accepted to EACL 2023

  46. Neural Transformers for Intraductal Papillary Mucosal Neoplasms (IPMN) Classification in MRI images

    Authors: Federica Proietto Salanitri, Giovanni Bellitto, Simone Palazzo, Ismail Irmakci, Michael B. Wallace, Candice W. Bolan, Megan Engels, Sanne Hoogenboom, Marco Aldinucci, Ulas Bagci, Daniela Giordano, Concetto Spampinato

    Abstract: Early detection of precancerous cysts or neoplasms, i.e., Intraductal Papillary Mucosal Neoplasms (IPMN), in pancreas is a challenging and complex task, and it may lead to a more favourable outcome. Once detected, grading IPMNs accurately is also necessary, since low-risk IPMNs can be under surveillance program, while high-risk IPMNs have to be surgically resected before they turn into cancer. Cur… ▽ More

    Submitted 21 June, 2022; originally announced June 2022.

  47. arXiv:2206.02696  [pdf, other

    cs.CL

    Learning to Ask Like a Physician

    Authors: Eric Lehman, Vladislav Lialin, Katelyn Y. Legaspi, Anne Janelle R. Sy, Patricia Therese S. Pile, Nicole Rose I. Alberto, Richard Raymund R. Ragasa, Corinna Victoria M. Puyat, Isabelle Rose I. Alberto, Pia Gabrielle I. Alfonso, Marianne Taliño, Dana Moukheiber, Byron C. Wallace, Anna Rumshisky, Jenifer J. Liang, Preethi Raghavan, Leo Anthony Celi, Peter Szolovits

    Abstract: Existing question answering (QA) datasets derived from electronic health records (EHR) are artificially generated and consequently fail to capture realistic physician information needs. We present Discharge Summary Clinical Questions (DiSCQ), a newly curated question dataset composed of 2,000+ questions paired with the snippets of text (triggers) that prompted each question. The questions are gene… ▽ More

    Submitted 6 June, 2022; originally announced June 2022.

  48. arXiv:2204.07562  [pdf, other

    cs.CL

    Evaluating Factuality in Text Simplification

    Authors: Ashwin Devaraj, William Sheffield, Byron C. Wallace, Junyi Jessy Li

    Abstract: Automated simplification models aim to make input texts more readable. Such methods have the potential to make complex information accessible to a wider audience, e.g., providing access to recent medical literature which might otherwise be impenetrable for a lay reader. However, such models risk introducing errors into automatically simplified texts, for instance by inserting statements unsupporte… ▽ More

    Submitted 15 April, 2022; originally announced April 2022.

    Comments: ACL 2022

  49. arXiv:2204.07030  [pdf, other

    cs.CV cs.LG

    Activation Regression for Continuous Domain Generalization with Applications to Crop Classification

    Authors: Samar Khanna, Bram Wallace, Kavita Bala, Bharath Hariharan

    Abstract: Geographic variance in satellite imagery impacts the ability of machine learning models to generalise to new regions. In this paper, we model geographic generalisation in medium resolution Landsat-8 satellite imagery as a continuous domain adaptation problem, demonstrating how models generalise better with appropriate domain knowledge. We develop a dataset spatially distributed across the entire c… ▽ More

    Submitted 14 April, 2022; originally announced April 2022.

  50. arXiv:2111.06012  [pdf, other

    cs.CL cs.LG

    Kronecker Factorization for Preventing Catastrophic Forgetting in Large-scale Medical Entity Linking

    Authors: Denis Jered McInerney, Luyang Kong, Kristjan Arumae, Byron Wallace, Parminder Bhatia

    Abstract: Multi-task learning is useful in NLP because it is often practically desirable to have a single model that works across a range of tasks. In the medical domain, sequential training on tasks may sometimes be the only way to train models, either because access to the original (potentially sensitive) data is no longer available, or simply owing to the computational costs inherent to joint retraining.… ▽ More

    Submitted 10 November, 2021; originally announced November 2021.