Skip to main content

Showing 1–24 of 24 results for author: Resnik, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.13138  [pdf, ps, other

    cs.CL cs.AI

    Large Language Models are Biased Because They Are Large Language Models

    Authors: Philip Resnik

    Abstract: This paper's primary goal is to provoke thoughtful discussion about the relationship between bias and fundamental properties of large language models. We do this by seeking to convince the reader that harmful biases are an inevitable consequence arising from the design of any large language model as LLMs are currently formulated. To the extent that this is true, it suggests that the problem of har… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Under review, 15 pages

  2. arXiv:2406.06608  [pdf, other

    cs.CL cs.AI

    The Prompt Report: A Systematic Survey of Prompting Techniques

    Authors: Sander Schulhoff, Michael Ilie, Nishant Balepur, Konstantine Kahadze, Amanda Liu, Chenglei Si, Yinheng Li, Aayush Gupta, HyoJung Han, Sevien Schulhoff, Pranav Sandeep Dulepet, Saurav Vidyadhara, Dayeon Ki, Sweta Agrawal, Chau Pham, Gerson Kroiz, Feileen Li, Hudson Tao, Ashay Srivastava, Hevander Da Costa, Saloni Gupta, Megan L. Rogers, Inna Goncearenco, Giuseppe Sarli, Igor Galynker , et al. (6 additional authors not shown)

    Abstract: Generative Artificial Intelligence (GenAI) systems are being increasingly deployed across all parts of industry and research settings. Developers and end users interact with these systems through the use of prompting or prompt engineering. While prompting is a widespread and highly researched concept, there exists conflicting terminology and a poor ontological understanding of what constitutes a p… ▽ More

    Submitted 16 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

  3. arXiv:2311.01449  [pdf, other

    cs.CL

    TopicGPT: A Prompt-based Topic Modeling Framework

    Authors: Chau Minh Pham, Alexander Hoyle, Simeng Sun, Philip Resnik, Mohit Iyyer

    Abstract: Topic modeling is a well-established technique for exploring text corpora. Conventional topic models (e.g., LDA) represent topics as bags of words that often require "reading the tea leaves" to interpret; additionally, they offer users minimal control over the formatting and specificity of resulting topics. To tackle these issues, we introduce TopicGPT, a prompt-based framework that uses large lan… ▽ More

    Submitted 1 April, 2024; v1 submitted 2 November, 2023; originally announced November 2023.

    Comments: Accepted to NAACL 2024 (Main conference)

  4. arXiv:2310.17774  [pdf, other

    cs.CL

    Words, Subwords, and Morphemes: What Really Matters in the Surprisal-Reading Time Relationship?

    Authors: Sathvik Nair, Philip Resnik

    Abstract: An important assumption that comes with using LLMs on psycholinguistic data has gone unverified. LLM-based predictions are based on subword tokenization, not decomposition of words into morphemes. Does that matter? We carefully test this by comparing surprisal estimates using orthographic, morphological, and BPE tokenization against reading time data. Our results replicate previous findings and pr… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

    Comments: Accepted to Findings of EMNLP 2023; 10 pages, 5 figures

  5. arXiv:2309.15136  [pdf, other

    eess.SP cs.MM cs.SD eess.AS eess.IV

    A multi-modal approach for identifying schizophrenia using cross-modal attention

    Authors: Gowtham Premananth, Yashish M. Siriwardena, Philip Resnik, Carol Espy-Wilson

    Abstract: This study focuses on how different modalities of human communication can be used to distinguish between healthy controls and subjects with schizophrenia who exhibit strong positive symptoms. We developed a multi-modal schizophrenia classification system using audio, video, and text. Facial action units and vocal tract variables were extracted as low-level features from video and audio respectivel… ▽ More

    Submitted 18 April, 2024; v1 submitted 26 September, 2023; originally announced September 2023.

    Comments: Accepted to Annual International Conference of the IEEE Engineering in Medicine and Biology Society 2024

  6. arXiv:2308.06459  [pdf, other

    cs.SI

    Mainstream News Articles Co-Shared with Fake News Buttress Misinformation Narratives

    Authors: Pranav Goel, Jon Green, David Lazer, Philip Resnik

    Abstract: Most prior and current research examining misinformation spread on social media focuses on reports published by 'fake' news sources. These approaches fail to capture another potential form of misinformation with a much larger audience: factual news from mainstream sources ('real' news) repurposed to promote false or misleading narratives. We operationalize narratives using an existing unsupervised… ▽ More

    Submitted 11 August, 2023; originally announced August 2023.

  7. arXiv:2305.14583  [pdf, other

    cs.CL

    Natural Language Decompositions of Implicit Content Enable Better Text Representations

    Authors: Alexander Hoyle, Rupak Sarkar, Pranav Goel, Philip Resnik

    Abstract: When people interpret text, they rely on inferences that go beyond the observed language itself. Inspired by this observation, we introduce a method for the analysis of text that takes implicitly communicated content explicitly into account. We use a large language model to produce sets of propositions that are inferentially related to the text that has been observed, then validate the plausibilit… ▽ More

    Submitted 24 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: Accepted to EMNLP 2023 (Main conference)

  8. arXiv:2211.07932  [pdf, other

    cs.CL

    Using Open-Ended Stressor Responses to Predict Depressive Symptoms across Demographics

    Authors: Carlos Aguirre, Mark Dredze, Philip Resnik

    Abstract: Stressors are related to depression, but this relationship is complex. We investigate the relationship between open-ended text responses about stressors and depressive symptoms across gender and racial/ethnic groups. First, we use topic models and other NLP tools to find thematic and vocabulary differences when reporting stressors across demographic groups. We train language models using self-repo… ▽ More

    Submitted 15 November, 2022; originally announced November 2022.

    Comments: Extended Abstract presented at Machine Learning for Health (ML4H) symposium 2022, November 28th, 2022, New Orleans, United States & Virtual, http://www.ml4h.cc, 6 pages

  9. arXiv:2210.16162  [pdf, other

    cs.CL cs.HC

    Are Neural Topic Models Broken?

    Authors: Alexander Hoyle, Pranav Goel, Rupak Sarkar, Philip Resnik

    Abstract: Recently, the relationship between automated and human evaluation of topic models has been called into question. Method developers have staked the efficacy of new topic model variants on automated measures, and their failure to approximate human preferences places these models on uncertain ground. Moreover, existing evaluation paradigms are often divorced from real-world use. Motivated by conten… ▽ More

    Submitted 28 October, 2022; originally announced October 2022.

    Comments: Accepted to Findings of EMNLP 2022

  10. arXiv:2107.02173  [pdf, other

    cs.CL cs.LG

    Is Automated Topic Model Evaluation Broken?: The Incoherence of Coherence

    Authors: Alexander Hoyle, Pranav Goel, Denis Peskov, Andrew Hian-Cheong, Jordan Boyd-Graber, Philip Resnik

    Abstract: Topic model evaluation, like evaluation of other unsupervised methods, can be contentious. However, the field has coalesced around automated estimates of topic coherence, which rely on the frequency of word co-occurrences in a reference corpus. Contemporary neural topic models surpass classical ones according to these metrics. At the same time, topic model evaluation suffers from a validation gap:… ▽ More

    Submitted 27 October, 2021; v1 submitted 5 July, 2021; originally announced July 2021.

    Comments: Accepted to NeurIPS 2021 (spotlight presentation). CR version

  11. arXiv:2104.13498  [pdf, other

    cs.CL cs.LG

    Towards Clinical Encounter Summarization: Learning to Compose Discharge Summaries from Prior Notes

    Authors: Han-Chin Shing, Chaitanya Shivade, Nima Pourdamghani, Feng Nan, Philip Resnik, Douglas Oard, Parminder Bhatia

    Abstract: The records of a clinical encounter can be extensive and complex, thus placing a premium on tools that can extract and summarize relevant information. This paper introduces the task of generating discharge summaries for a clinical encounter. Summaries in this setting need to be faithful, traceable, and scale to multiple long documents, motivating the use of extract-then-abstract summarization casc… ▽ More

    Submitted 27 April, 2021; originally announced April 2021.

  12. arXiv:2010.02377  [pdf, other

    cs.CL cs.IR cs.LG

    Improving Neural Topic Models using Knowledge Distillation

    Authors: Alexander Hoyle, Pranav Goel, Philip Resnik

    Abstract: Topic models are often used to identify human-interpretable topics to help make sense of large document collections. We use knowledge distillation to combine the best attributes of probabilistic topic models and pretrained transformers. Our modular method can be straightforwardly applied with any neural topic model to improve topic quality, which we demonstrate using two models having disparate ar… ▽ More

    Submitted 5 October, 2020; originally announced October 2020.

    Comments: Accepted to EMNLP 2020

  13. arXiv:1911.06848  [pdf, other

    cs.CL cs.LG

    Assigning Medical Codes at the Encounter Level by Paying Attention to Documents

    Authors: Han-Chin Shing, Guoli Wang, Philip Resnik

    Abstract: The vast majority of research in computer assisted medical coding focuses on coding at the document level, but a substantial proportion of medical coding in the real world involves coding at the level of clinical encounters, each of which is typically represented by a potentially large set of documents. We introduce encounter-level document attention networks, which use hierarchical attention to e… ▽ More

    Submitted 15 November, 2019; originally announced November 2019.

    Comments: Machine Learning for Health (ML4H) at NeurIPS 2019 - Extended Abstract

  14. arXiv:1809.03992  [pdf, other

    cs.CL

    Assessing Composition in Sentence Vector Representations

    Authors: Allyson Ettinger, Ahmed Elgohary, Colin Phillips, Philip Resnik

    Abstract: An important component of achieving language understanding is mastering the composition of sentence meaning, but an immediate challenge to solving this problem is the opacity of sentence vector representations produced by current neural sentence composition models. We present a method to address this challenge, develo** tasks that directly target compositional meaning information in sentence vec… ▽ More

    Submitted 11 September, 2018; originally announced September 2018.

    Comments: COLING 2018

    Journal ref: In Proceedings of the 27th International Conference on Computational Linguistics (pp. 1790-1801)

  15. arXiv:1510.07586  [pdf, ps, other

    cs.CL

    Parser for Abstract Meaning Representation using Learning to Search

    Authors: Sudha Rao, Yogarshi Vyas, Hal Daume III, Philip Resnik

    Abstract: We develop a novel technique to parse English sentences into Abstract Meaning Representation (AMR) using SEARN, a Learning to Search approach, by modeling the concept and the relation learning in a unified framework. We evaluate our parser on multiple datasets from varied domains and show an absolute improvement of 2% to 6% over the state-of-the-art. Additionally we show that using the most freque… ▽ More

    Submitted 26 October, 2015; originally announced October 2015.

  16. arXiv:1212.0927  [pdf, other

    cs.CL cs.DS cs.FL

    Two Algorithms for Finding $k$ Shortest Paths of a Weighted Pushdown Automaton

    Authors: Ke Wu, Philip Resnik

    Abstract: We introduce efficient algorithms for finding the $k$ shortest paths of a weighted pushdown automaton (WPDA), a compact representation of a weighted set of strings with potential applications in parsing and machine translation. Both of our algorithms are derived from the same weighted deductive logic description of the execution of a WPDA using different search strategies. Experimental results sho… ▽ More

    Submitted 5 February, 2013; v1 submitted 4 December, 2012; originally announced December 2012.

  17. Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language

    Authors: P. Resnik

    Abstract: This article presents a measure of semantic similarity in an IS-A taxonomy based on the notion of shared information content. Experimental evaluation against a benchmark set of human similarity judgments demonstrates that the measure performs better than the traditional edge-counting approach. The article presents algorithms that take advantage of taxonomic similarity in resolving… ▽ More

    Submitted 26 May, 2011; originally announced May 2011.

    Journal ref: Journal Of Artificial Intelligence Research, Volume 11, pages 95-130, 1999

  18. arXiv:cs/0008007  [pdf, ps, other

    cs.CL

    Tagger Evaluation Given Hierarchical Tag Sets

    Authors: I. Dan Melamed, Philip Resnik

    Abstract: We present methods for evaluating human and automatic taggers that extend current practice in three ways. First, we show how to evaluate taggers that assign multiple tags to each test instance, even if they do not assign probabilities. Second, we show how to accommodate a common property of manually constructed ``gold standards'' that are typically used for objective evaluation, namely that ther… ▽ More

    Submitted 9 August, 2000; originally announced August 2000.

    Comments: preprint is 7 pages, laid out differently than printed version

    ACM Class: G.3; I.2.7; J.5

    Journal ref: Computers and the Humanities 34(1-2). Special issue on SENSEVAL. pp. 79-84

  19. Parallel Strands: A Preliminary Investigation into Mining the Web for Bilingual Text

    Authors: Philip Resnik

    Abstract: Parallel corpora are a valuable resource for machine translation, but at present their availability and utility is limited by genre- and domain-specificity, licensing restrictions, and the basic difficulty of locating parallel texts in all but the most dominant of the world's languages. A parallel corpus resource not yet explored is the World Wide Web, which hosts an abundance of pages in parall… ▽ More

    Submitted 7 August, 1998; originally announced August 1998.

    Comments: LaTeX2e, 11 pages, 7 eps figures; uses psfig, llncs.cls, theapa.sty. An Appendix at http://umiacs.umd.edu/~resnik/amta98/amta98_appendix.html contains test data

    Report number: UMIACS TR 98-41

    Journal ref: Proceedings of AMTA-98

  20. Evaluating Multilingual Gisting of Web Pages

    Authors: Philip Resnik

    Abstract: We describe a prototype system for multilingual gisting of Web pages, and present an evaluation methodology based on the notion of gisting as decision support. This evaluation paradigm is straightforward, rigorous, permits fair comparison of alternative approaches, and should easily generalize to evaluation in other situations where the user is faced with decision-making on the basis of informat… ▽ More

    Submitted 7 April, 1997; originally announced April 1997.

    Comments: 7 pages, uses psfig and aaai styles

    Report number: CS-TR-3783/LAMP-TR-009/UMIACS-TR-97-39

  21. Semi-Automatic Acquisition of Domain-Specific Translation Lexicons

    Authors: Philip Resnik, I. Dan Melamed

    Abstract: We investigate the utility of an algorithm for translation lexicon acquisition (SABLE), used previously on a very large corpus to acquire general translation lexicons, when that algorithm is applied to a much smaller corpus to produce candidates for domain-specific translation lexicons.

    Submitted 27 March, 1997; originally announced March 1997.

    Comments: 8 pages

    Journal ref: Proceedings of the 5th ANLP Conference, 1997.

  22. Using Information Content to Evaluate Semantic Similarity in a Taxonomy

    Authors: Philip Resnik

    Abstract: This paper presents a new measure of semantic similarity in an IS-A taxonomy, based on the notion of information content. Experimental evaluation suggests that the measure performs encouragingly well (a correlation of r = 0.79 with a benchmark set of human similarity judgments, with an upper bound of r = 0.90 for human subjects performing the same task), and significantly better than the traditi… ▽ More

    Submitted 29 November, 1995; originally announced November 1995.

    Comments: 6 pages, 2 postscript figures, uses ijcai95.sty

    Journal ref: Proceedings of the 14th International Joint Conference on Artificial Intelligence

  23. Disambiguating Noun Grou**s with Respect to WordNet Senses

    Authors: Philip Resnik

    Abstract: Word grou**s useful for language processing tasks are increasingly available, as thesauri appear on-line, and as distributional word clustering techniques improve. However, for many tasks, one is interested in relationships among word {\em senses}, not words. This paper presents a method for automatic sense disambiguation of nouns appearing within sets of related nouns --- the kind of data one… ▽ More

    Submitted 29 November, 1995; originally announced November 1995.

    Comments: LaTeX, 16 pages, uses breakcites.sty, authdate.sty

    Journal ref: Proceedings of the 3rd Workshop on Very Large Corpora, MIT, 30 June 1995

  24. arXiv:cmp-lg/9410026  [pdf, ps

    cs.CL

    A Rule-Based Approach To Prepositional Phrase Attachment Disambiguation

    Authors: Eric Brill, Philip Resnik

    Abstract: In this paper, we describe a new corpus-based approach to prepositional phrase attachment disambiguation, and present results comparing performance of this algorithm with other corpus-based approaches to this problem.

    Submitted 25 October, 1994; originally announced October 1994.

    Comments: 7 pages, compressed uuencoded postscript

    Journal ref: COLING 1994