Search | arXiv e-print repository

Large Language Models are Biased Because They Are Large Language Models

Abstract: This paper's primary goal is to provoke thoughtful discussion about the relationship between bias and fundamental properties of large language models. We do this by seeking to convince the reader that harmful biases are an inevitable consequence arising from the design of any large language model as LLMs are currently formulated. To the extent that this is true, it suggests that the problem of har… ▽ More This paper's primary goal is to provoke thoughtful discussion about the relationship between bias and fundamental properties of large language models. We do this by seeking to convince the reader that harmful biases are an inevitable consequence arising from the design of any large language model as LLMs are currently formulated. To the extent that this is true, it suggests that the problem of harmful bias cannot be properly addressed without a serious reconsideration of AI driven by LLMs, going back to the foundational assumptions underlying their design. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: Under review, 15 pages

arXiv:2406.06608 [pdf, other]

The Prompt Report: A Systematic Survey of Prompting Techniques

Authors: Sander Schulhoff, Michael Ilie, Nishant Balepur, Konstantine Kahadze, Amanda Liu, Chenglei Si, Yinheng Li, Aayush Gupta, HyoJung Han, Sevien Schulhoff, Pranav Sandeep Dulepet, Saurav Vidyadhara, Dayeon Ki, Sweta Agrawal, Chau Pham, Gerson Kroiz, Feileen Li, Hudson Tao, Ashay Srivastava, Hevander Da Costa, Saloni Gupta, Megan L. Rogers, Inna Goncearenco, Giuseppe Sarli, Igor Galynker , et al. (6 additional authors not shown)

Abstract: Generative Artificial Intelligence (GenAI) systems are being increasingly deployed across all parts of industry and research settings. Developers and end users interact with these systems through the use of prompting or prompt engineering. While prompting is a widespread and highly researched concept, there exists conflicting terminology and a poor ontological understanding of what constitutes a p… ▽ More Generative Artificial Intelligence (GenAI) systems are being increasingly deployed across all parts of industry and research settings. Developers and end users interact with these systems through the use of prompting or prompt engineering. While prompting is a widespread and highly researched concept, there exists conflicting terminology and a poor ontological understanding of what constitutes a prompt due to the area's nascency. This paper establishes a structured understanding of prompts, by assembling a taxonomy of prompting techniques and analyzing their use. We present a comprehensive vocabulary of 33 vocabulary terms, a taxonomy of 58 text-only prompting techniques, and 40 techniques for other modalities. We further present a meta-analysis of the entire literature on natural language prefix-prompting. △ Less

Submitted 16 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

arXiv:2311.01449 [pdf, other]

TopicGPT: A Prompt-based Topic Modeling Framework

Authors: Chau Minh Pham, Alexander Hoyle, Simeng Sun, Philip Resnik, Mohit Iyyer

Abstract: Topic modeling is a well-established technique for exploring text corpora. Conventional topic models (e.g., LDA) represent topics as bags of words that often require "reading the tea leaves" to interpret; additionally, they offer users minimal control over the formatting and specificity of resulting topics. To tackle these issues, we introduce TopicGPT, a prompt-based framework that uses large lan… ▽ More Topic modeling is a well-established technique for exploring text corpora. Conventional topic models (e.g., LDA) represent topics as bags of words that often require "reading the tea leaves" to interpret; additionally, they offer users minimal control over the formatting and specificity of resulting topics. To tackle these issues, we introduce TopicGPT, a prompt-based framework that uses large language models (LLMs) to uncover latent topics in a text collection. TopicGPT produces topics that align better with human categorizations compared to competing methods: it achieves a harmonic mean purity of 0.74 against human-annotated Wikipedia topics compared to 0.64 for the strongest baseline. Its topics are also interpretable, dispensing with ambiguous bags of words in favor of topics with natural language labels and associated free-form descriptions. Moreover, the framework is highly adaptable, allowing users to specify constraints and modify topics without the need for model retraining. By streamlining access to high-quality and interpretable topics, TopicGPT represents a compelling, human-centered approach to topic modeling. △ Less

Submitted 1 April, 2024; v1 submitted 2 November, 2023; originally announced November 2023.

Comments: Accepted to NAACL 2024 (Main conference)

arXiv:2310.17774 [pdf, other]

Words, Subwords, and Morphemes: What Really Matters in the Surprisal-Reading Time Relationship?

Authors: Sathvik Nair, Philip Resnik

Abstract: An important assumption that comes with using LLMs on psycholinguistic data has gone unverified. LLM-based predictions are based on subword tokenization, not decomposition of words into morphemes. Does that matter? We carefully test this by comparing surprisal estimates using orthographic, morphological, and BPE tokenization against reading time data. Our results replicate previous findings and pr… ▽ More An important assumption that comes with using LLMs on psycholinguistic data has gone unverified. LLM-based predictions are based on subword tokenization, not decomposition of words into morphemes. Does that matter? We carefully test this by comparing surprisal estimates using orthographic, morphological, and BPE tokenization against reading time data. Our results replicate previous findings and provide evidence that in the aggregate, predictions using BPE tokenization do not suffer relative to morphological and orthographic segmentation. However, a finer-grained analysis points to potential issues with relying on BPE-based tokenization, as well as providing promising results involving morphologically-aware surprisal estimates and suggesting a new method for evaluating morphological prediction. △ Less

Submitted 26 October, 2023; originally announced October 2023.

Comments: Accepted to Findings of EMNLP 2023; 10 pages, 5 figures

arXiv:2309.15136 [pdf, other]

A multi-modal approach for identifying schizophrenia using cross-modal attention

Authors: Gowtham Premananth, Yashish M. Siriwardena, Philip Resnik, Carol Espy-Wilson

Abstract: This study focuses on how different modalities of human communication can be used to distinguish between healthy controls and subjects with schizophrenia who exhibit strong positive symptoms. We developed a multi-modal schizophrenia classification system using audio, video, and text. Facial action units and vocal tract variables were extracted as low-level features from video and audio respectivel… ▽ More This study focuses on how different modalities of human communication can be used to distinguish between healthy controls and subjects with schizophrenia who exhibit strong positive symptoms. We developed a multi-modal schizophrenia classification system using audio, video, and text. Facial action units and vocal tract variables were extracted as low-level features from video and audio respectively, which were then used to compute high-level coordination features that served as the inputs to the audio and video modalities. Context-independent text embeddings extracted from transcriptions of speech were used as the input for the text modality. The multi-modal system is developed by fusing a segment-to-session-level classifier for video and audio modalities with a text model based on a Hierarchical Attention Network (HAN) with cross-modal attention. The proposed multi-modal system outperforms the previous state-of-the-art multi-modal system by 8.53% in the weighted average F1 score. △ Less

Submitted 18 April, 2024; v1 submitted 26 September, 2023; originally announced September 2023.

Comments: Accepted to Annual International Conference of the IEEE Engineering in Medicine and Biology Society 2024

arXiv:2308.06459 [pdf, other]

Mainstream News Articles Co-Shared with Fake News Buttress Misinformation Narratives

Authors: Pranav Goel, Jon Green, David Lazer, Philip Resnik

Abstract: Most prior and current research examining misinformation spread on social media focuses on reports published by 'fake' news sources. These approaches fail to capture another potential form of misinformation with a much larger audience: factual news from mainstream sources ('real' news) repurposed to promote false or misleading narratives. We operationalize narratives using an existing unsupervised… ▽ More Most prior and current research examining misinformation spread on social media focuses on reports published by 'fake' news sources. These approaches fail to capture another potential form of misinformation with a much larger audience: factual news from mainstream sources ('real' news) repurposed to promote false or misleading narratives. We operationalize narratives using an existing unsupervised NLP technique and examine the narratives present in misinformation content. We find that certain articles from reliable outlets are shared by a disproportionate number of users who also shared fake news on Twitter. We consider these 'real' news articles to be co-shared with fake news. We show that co-shared articles contain existing misinformation narratives at a significantly higher rate than articles from the same reliable outlets that are not co-shared with fake news. This holds true even when articles are chosen following strict criteria of reliability for the outlets and after accounting for the alternative explanation of partisan curation of articles. For example, we observe that a recent article published by The Washington Post titled "Vaccinated people now make up a majority of COVID deaths" was disproportionately shared by Twitter users with a history of sharing anti-vaccine false news reports. Our findings suggest a strategic repurposing of mainstream news by conveyors of misinformation as a way to enhance the reach and persuasiveness of misleading narratives. We also conduct a comprehensive case study to help highlight how such repurposing can happen on Twitter as a consequence of the inclusion of particular narratives in the framing of mainstream news. △ Less

Submitted 11 August, 2023; originally announced August 2023.

arXiv:2305.14583 [pdf, other]

Natural Language Decompositions of Implicit Content Enable Better Text Representations

Authors: Alexander Hoyle, Rupak Sarkar, Pranav Goel, Philip Resnik

Abstract: When people interpret text, they rely on inferences that go beyond the observed language itself. Inspired by this observation, we introduce a method for the analysis of text that takes implicitly communicated content explicitly into account. We use a large language model to produce sets of propositions that are inferentially related to the text that has been observed, then validate the plausibilit… ▽ More When people interpret text, they rely on inferences that go beyond the observed language itself. Inspired by this observation, we introduce a method for the analysis of text that takes implicitly communicated content explicitly into account. We use a large language model to produce sets of propositions that are inferentially related to the text that has been observed, then validate the plausibility of the generated content via human judgments. Incorporating these explicit representations of implicit content proves useful in multiple problem settings that involve the human interpretation of utterances: assessing the similarity of arguments, making sense of a body of opinion data, and modeling legislative behavior. Our results suggest that modeling the meanings behind observed language, rather than the literal text alone, is a valuable direction for NLP and particularly its applications to social science. △ Less

Submitted 24 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

Comments: Accepted to EMNLP 2023 (Main conference)

arXiv:2211.07932 [pdf, other]

Using Open-Ended Stressor Responses to Predict Depressive Symptoms across Demographics

Authors: Carlos Aguirre, Mark Dredze, Philip Resnik

Abstract: Stressors are related to depression, but this relationship is complex. We investigate the relationship between open-ended text responses about stressors and depressive symptoms across gender and racial/ethnic groups. First, we use topic models and other NLP tools to find thematic and vocabulary differences when reporting stressors across demographic groups. We train language models using self-repo… ▽ More Stressors are related to depression, but this relationship is complex. We investigate the relationship between open-ended text responses about stressors and depressive symptoms across gender and racial/ethnic groups. First, we use topic models and other NLP tools to find thematic and vocabulary differences when reporting stressors across demographic groups. We train language models using self-reported stressors to predict depressive symptoms, finding a relationship between stressors and depression. Finally, we find that differences in stressors translate to downstream performance differences across demographic groups. △ Less

Submitted 15 November, 2022; originally announced November 2022.

Comments: Extended Abstract presented at Machine Learning for Health (ML4H) symposium 2022, November 28th, 2022, New Orleans, United States & Virtual, http://www.ml4h.cc, 6 pages

arXiv:2210.16162 [pdf, other]

Are Neural Topic Models Broken?

Authors: Alexander Hoyle, Pranav Goel, Rupak Sarkar, Philip Resnik

Abstract: Recently, the relationship between automated and human evaluation of topic models has been called into question. Method developers have staked the efficacy of new topic model variants on automated measures, and their failure to approximate human preferences places these models on uncertain ground. Moreover, existing evaluation paradigms are often divorced from real-world use. Motivated by conten… ▽ More Recently, the relationship between automated and human evaluation of topic models has been called into question. Method developers have staked the efficacy of new topic model variants on automated measures, and their failure to approximate human preferences places these models on uncertain ground. Moreover, existing evaluation paradigms are often divorced from real-world use. Motivated by content analysis as a dominant real-world use case for topic modeling, we analyze two related aspects of topic models that affect their effectiveness and trustworthiness in practice for that purpose: the stability of their estimates and the extent to which the model's discovered categories align with human-determined categories in the data. We find that neural topic models fare worse in both respects compared to an established classical method. We take a step toward addressing both issues in tandem by demonstrating that a straightforward ensembling method can reliably outperform the members of the ensemble. △ Less

Submitted 28 October, 2022; originally announced October 2022.

Comments: Accepted to Findings of EMNLP 2022

arXiv:2107.02173 [pdf, other]

Is Automated Topic Model Evaluation Broken?: The Incoherence of Coherence

Authors: Alexander Hoyle, Pranav Goel, Denis Peskov, Andrew Hian-Cheong, Jordan Boyd-Graber, Philip Resnik

Abstract: Topic model evaluation, like evaluation of other unsupervised methods, can be contentious. However, the field has coalesced around automated estimates of topic coherence, which rely on the frequency of word co-occurrences in a reference corpus. Contemporary neural topic models surpass classical ones according to these metrics. At the same time, topic model evaluation suffers from a validation gap:… ▽ More Topic model evaluation, like evaluation of other unsupervised methods, can be contentious. However, the field has coalesced around automated estimates of topic coherence, which rely on the frequency of word co-occurrences in a reference corpus. Contemporary neural topic models surpass classical ones according to these metrics. At the same time, topic model evaluation suffers from a validation gap: automated coherence, developed for classical models, has not been validated using human experimentation for neural models. In addition, a meta-analysis of topic modeling literature reveals a substantial standardization gap in automated topic modeling benchmarks. To address the validation gap, we compare automated coherence with the two most widely accepted human judgment tasks: topic rating and word intrusion. To address the standardization gap, we systematically evaluate a dominant classical model and two state-of-the-art neural models on two commonly used datasets. Automated evaluations declare a winning model when corresponding human evaluations do not, calling into question the validity of fully automatic evaluations independent of human judgments. △ Less

Submitted 27 October, 2021; v1 submitted 5 July, 2021; originally announced July 2021.

Comments: Accepted to NeurIPS 2021 (spotlight presentation). CR version

arXiv:2104.13498 [pdf, other]

Towards Clinical Encounter Summarization: Learning to Compose Discharge Summaries from Prior Notes

Authors: Han-Chin Shing, Chaitanya Shivade, Nima Pourdamghani, Feng Nan, Philip Resnik, Douglas Oard, Parminder Bhatia

Abstract: The records of a clinical encounter can be extensive and complex, thus placing a premium on tools that can extract and summarize relevant information. This paper introduces the task of generating discharge summaries for a clinical encounter. Summaries in this setting need to be faithful, traceable, and scale to multiple long documents, motivating the use of extract-then-abstract summarization casc… ▽ More The records of a clinical encounter can be extensive and complex, thus placing a premium on tools that can extract and summarize relevant information. This paper introduces the task of generating discharge summaries for a clinical encounter. Summaries in this setting need to be faithful, traceable, and scale to multiple long documents, motivating the use of extract-then-abstract summarization cascades. We introduce two new measures, faithfulness and hallucination rate for evaluation in this task, which complement existing measures for fluency and informativeness. Results across seven medical sections and five models show that a summarization architecture that supports traceability yields promising results, and that a sentence-rewriting approach performs consistently on the measure used for faithfulness (faithfulness-adjusted $F_3$) over a diverse range of generated sections. △ Less

Submitted 27 April, 2021; originally announced April 2021.

arXiv:2010.02377 [pdf, other]

Improving Neural Topic Models using Knowledge Distillation

Authors: Alexander Hoyle, Pranav Goel, Philip Resnik

Abstract: Topic models are often used to identify human-interpretable topics to help make sense of large document collections. We use knowledge distillation to combine the best attributes of probabilistic topic models and pretrained transformers. Our modular method can be straightforwardly applied with any neural topic model to improve topic quality, which we demonstrate using two models having disparate ar… ▽ More Topic models are often used to identify human-interpretable topics to help make sense of large document collections. We use knowledge distillation to combine the best attributes of probabilistic topic models and pretrained transformers. Our modular method can be straightforwardly applied with any neural topic model to improve topic quality, which we demonstrate using two models having disparate architectures, obtaining state-of-the-art topic coherence. We show that our adaptable framework not only improves performance in the aggregate over all estimated topics, as is commonly reported, but also in head-to-head comparisons of aligned topics. △ Less

Submitted 5 October, 2020; originally announced October 2020.

Comments: Accepted to EMNLP 2020

arXiv:1911.06848 [pdf, other]

Assigning Medical Codes at the Encounter Level by Paying Attention to Documents

Authors: Han-Chin Shing, Guoli Wang, Philip Resnik

Abstract: The vast majority of research in computer assisted medical coding focuses on coding at the document level, but a substantial proportion of medical coding in the real world involves coding at the level of clinical encounters, each of which is typically represented by a potentially large set of documents. We introduce encounter-level document attention networks, which use hierarchical attention to e… ▽ More The vast majority of research in computer assisted medical coding focuses on coding at the document level, but a substantial proportion of medical coding in the real world involves coding at the level of clinical encounters, each of which is typically represented by a potentially large set of documents. We introduce encounter-level document attention networks, which use hierarchical attention to explicitly take the hierarchical structure of encounter documentation into account. Experimental evaluation demonstrates improvements in coding accuracy as well as facilitation of human reviewers in their ability to identify which documents within an encounter play a role in determining the encounter level codes. △ Less

Submitted 15 November, 2019; originally announced November 2019.

Comments: Machine Learning for Health (ML4H) at NeurIPS 2019 - Extended Abstract

arXiv:1809.03992 [pdf, other]

Assessing Composition in Sentence Vector Representations

Authors: Allyson Ettinger, Ahmed Elgohary, Colin Phillips, Philip Resnik

Abstract: An important component of achieving language understanding is mastering the composition of sentence meaning, but an immediate challenge to solving this problem is the opacity of sentence vector representations produced by current neural sentence composition models. We present a method to address this challenge, develo** tasks that directly target compositional meaning information in sentence vec… ▽ More An important component of achieving language understanding is mastering the composition of sentence meaning, but an immediate challenge to solving this problem is the opacity of sentence vector representations produced by current neural sentence composition models. We present a method to address this challenge, develo** tasks that directly target compositional meaning information in sentence vector representations with a high degree of precision and control. To enable the creation of these controlled tasks, we introduce a specialized sentence generation system that produces large, annotated sentence sets meeting specified syntactic, semantic and lexical constraints. We describe the details of the method and generation system, and then present results of experiments applying our method to probe for compositional information in embeddings from a number of existing sentence composition models. We find that the method is able to extract useful information about the differing capacities of these models, and we discuss the implications of our results with respect to these systems' capturing of sentence information. We make available for public use the datasets used for these experiments, as well as the generation system. △ Less

Submitted 11 September, 2018; originally announced September 2018.

Comments: COLING 2018

Journal ref: In Proceedings of the 27th International Conference on Computational Linguistics (pp. 1790-1801)

arXiv:1510.07586 [pdf, ps, other]

Parser for Abstract Meaning Representation using Learning to Search

Authors: Sudha Rao, Yogarshi Vyas, Hal Daume III, Philip Resnik

Abstract: We develop a novel technique to parse English sentences into Abstract Meaning Representation (AMR) using SEARN, a Learning to Search approach, by modeling the concept and the relation learning in a unified framework. We evaluate our parser on multiple datasets from varied domains and show an absolute improvement of 2% to 6% over the state-of-the-art. Additionally we show that using the most freque… ▽ More We develop a novel technique to parse English sentences into Abstract Meaning Representation (AMR) using SEARN, a Learning to Search approach, by modeling the concept and the relation learning in a unified framework. We evaluate our parser on multiple datasets from varied domains and show an absolute improvement of 2% to 6% over the state-of-the-art. Additionally we show that using the most frequent concept gives us a baseline that is stronger than the state-of-the-art for concept prediction. We plan to release our parser for public use. △ Less

Submitted 26 October, 2015; originally announced October 2015.

arXiv:1212.0927 [pdf, other]

Two Algorithms for Finding $k$ Shortest Paths of a Weighted Pushdown Automaton

Authors: Ke Wu, Philip Resnik

Abstract: We introduce efficient algorithms for finding the $k$ shortest paths of a weighted pushdown automaton (WPDA), a compact representation of a weighted set of strings with potential applications in parsing and machine translation. Both of our algorithms are derived from the same weighted deductive logic description of the execution of a WPDA using different search strategies. Experimental results sho… ▽ More We introduce efficient algorithms for finding the $k$ shortest paths of a weighted pushdown automaton (WPDA), a compact representation of a weighted set of strings with potential applications in parsing and machine translation. Both of our algorithms are derived from the same weighted deductive logic description of the execution of a WPDA using different search strategies. Experimental results show our Algorithm 2 adds very little overhead vs. the single shortest path algorithm, even with a large $k$. △ Less

Submitted 5 February, 2013; v1 submitted 4 December, 2012; originally announced December 2012.

arXiv:1105.5444 [pdf, ps]

doi 10.1613/jair.514

Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language

Authors: P. Resnik

Abstract: This article presents a measure of semantic similarity in an IS-A taxonomy based on the notion of shared information content. Experimental evaluation against a benchmark set of human similarity judgments demonstrates that the measure performs better than the traditional edge-counting approach. The article presents algorithms that take advantage of taxonomic similarity in resolving… ▽ More This article presents a measure of semantic similarity in an IS-A taxonomy based on the notion of shared information content. Experimental evaluation against a benchmark set of human similarity judgments demonstrates that the measure performs better than the traditional edge-counting approach. The article presents algorithms that take advantage of taxonomic similarity in resolving syntactic and semantic ambiguity, along with experimental results demonstrating their effectiveness. △ Less

Submitted 26 May, 2011; originally announced May 2011.

Journal ref: Journal Of Artificial Intelligence Research, Volume 11, pages 95-130, 1999

arXiv:cs/0008007 [pdf, ps, other]

Tagger Evaluation Given Hierarchical Tag Sets

Authors: I. Dan Melamed, Philip Resnik

Abstract: We present methods for evaluating human and automatic taggers that extend current practice in three ways. First, we show how to evaluate taggers that assign multiple tags to each test instance, even if they do not assign probabilities. Second, we show how to accommodate a common property of manually constructed ``gold standards'' that are typically used for objective evaluation, namely that ther… ▽ More We present methods for evaluating human and automatic taggers that extend current practice in three ways. First, we show how to evaluate taggers that assign multiple tags to each test instance, even if they do not assign probabilities. Second, we show how to accommodate a common property of manually constructed ``gold standards'' that are typically used for objective evaluation, namely that there is often more than one correct answer. Third, we show how to measure performance when the set of possible tags is tree-structured in an IS-A hierarchy. To illustrate how our methods can be used to measure inter-annotator agreement, we show how to compute the kappa coefficient over hierarchical tag sets. △ Less

Submitted 9 August, 2000; originally announced August 2000.

Comments: preprint is 7 pages, laid out differently than printed version

ACM Class: G.3; I.2.7; J.5

Journal ref: Computers and the Humanities 34(1-2). Special issue on SENSEVAL. pp. 79-84

arXiv:cmp-lg/9808003 [pdf, ps, other]

Parallel Strands: A Preliminary Investigation into Mining the Web for Bilingual Text

Authors: Philip Resnik

Abstract: Parallel corpora are a valuable resource for machine translation, but at present their availability and utility is limited by genre- and domain-specificity, licensing restrictions, and the basic difficulty of locating parallel texts in all but the most dominant of the world's languages. A parallel corpus resource not yet explored is the World Wide Web, which hosts an abundance of pages in parall… ▽ More Parallel corpora are a valuable resource for machine translation, but at present their availability and utility is limited by genre- and domain-specificity, licensing restrictions, and the basic difficulty of locating parallel texts in all but the most dominant of the world's languages. A parallel corpus resource not yet explored is the World Wide Web, which hosts an abundance of pages in parallel translation, offering a potential solution to some of these problems and unique opportunities of its own. This paper presents the necessary first step in that exploration: a method for automatically finding parallel translated documents on the Web. The technique is conceptually simple, fully language independent, and scalable, and preliminary evaluation results indicate that the method may be accurate enough to apply without human intervention. △ Less

Submitted 7 August, 1998; originally announced August 1998.

Comments: LaTeX2e, 11 pages, 7 eps figures; uses psfig, llncs.cls, theapa.sty. An Appendix at http://umiacs.umd.edu/~resnik/amta98/amta98_appendix.html contains test data

Report number: UMIACS TR 98-41

Journal ref: Proceedings of AMTA-98

arXiv:cmp-lg/9704001 [pdf, ps, other]

Evaluating Multilingual Gisting of Web Pages

Authors: Philip Resnik

Abstract: We describe a prototype system for multilingual gisting of Web pages, and present an evaluation methodology based on the notion of gisting as decision support. This evaluation paradigm is straightforward, rigorous, permits fair comparison of alternative approaches, and should easily generalize to evaluation in other situations where the user is faced with decision-making on the basis of informat… ▽ More We describe a prototype system for multilingual gisting of Web pages, and present an evaluation methodology based on the notion of gisting as decision support. This evaluation paradigm is straightforward, rigorous, permits fair comparison of alternative approaches, and should easily generalize to evaluation in other situations where the user is faced with decision-making on the basis of information in restricted or alternative form. △ Less

Submitted 7 April, 1997; originally announced April 1997.

Comments: 7 pages, uses psfig and aaai styles

Report number: CS-TR-3783/LAMP-TR-009/UMIACS-TR-97-39

arXiv:cmp-lg/9703005 [pdf, ps, other]

Semi-Automatic Acquisition of Domain-Specific Translation Lexicons

Authors: Philip Resnik, I. Dan Melamed

Abstract: We investigate the utility of an algorithm for translation lexicon acquisition (SABLE), used previously on a very large corpus to acquire general translation lexicons, when that algorithm is applied to a much smaller corpus to produce candidates for domain-specific translation lexicons. We investigate the utility of an algorithm for translation lexicon acquisition (SABLE), used previously on a very large corpus to acquire general translation lexicons, when that algorithm is applied to a much smaller corpus to produce candidates for domain-specific translation lexicons. △ Less

Submitted 27 March, 1997; originally announced March 1997.

Comments: 8 pages

Journal ref: Proceedings of the 5th ANLP Conference, 1997.

arXiv:cmp-lg/9511007 [pdf, ps, other]

Using Information Content to Evaluate Semantic Similarity in a Taxonomy

Authors: Philip Resnik

Abstract: This paper presents a new measure of semantic similarity in an IS-A taxonomy, based on the notion of information content. Experimental evaluation suggests that the measure performs encouragingly well (a correlation of r = 0.79 with a benchmark set of human similarity judgments, with an upper bound of r = 0.90 for human subjects performing the same task), and significantly better than the traditi… ▽ More This paper presents a new measure of semantic similarity in an IS-A taxonomy, based on the notion of information content. Experimental evaluation suggests that the measure performs encouragingly well (a correlation of r = 0.79 with a benchmark set of human similarity judgments, with an upper bound of r = 0.90 for human subjects performing the same task), and significantly better than the traditional edge counting approach (r = 0.66). △ Less

Submitted 29 November, 1995; originally announced November 1995.

Comments: 6 pages, 2 postscript figures, uses ijcai95.sty

Journal ref: Proceedings of the 14th International Joint Conference on Artificial Intelligence

arXiv:cmp-lg/9511006 [pdf, ps, other]

Disambiguating Noun Grou**s with Respect to WordNet Senses

Authors: Philip Resnik

Abstract: Word grou**s useful for language processing tasks are increasingly available, as thesauri appear on-line, and as distributional word clustering techniques improve. However, for many tasks, one is interested in relationships among word {\em senses}, not words. This paper presents a method for automatic sense disambiguation of nouns appearing within sets of related nouns --- the kind of data one… ▽ More Word grou**s useful for language processing tasks are increasingly available, as thesauri appear on-line, and as distributional word clustering techniques improve. However, for many tasks, one is interested in relationships among word {\em senses}, not words. This paper presents a method for automatic sense disambiguation of nouns appearing within sets of related nouns --- the kind of data one finds in on-line thesauri, or as the output of distributional clustering algorithms. Disambiguation is performed with respect to WordNet senses, which are fairly fine-grained; however, the method also permits the assignment of higher-level WordNet categories rather than sense labels. The method is illustrated primarily by example, though results of a more rigorous evaluation are also presented. △ Less

Submitted 29 November, 1995; originally announced November 1995.

Comments: LaTeX, 16 pages, uses breakcites.sty, authdate.sty

Journal ref: Proceedings of the 3rd Workshop on Very Large Corpora, MIT, 30 June 1995

arXiv:cmp-lg/9410026 [pdf, ps]

A Rule-Based Approach To Prepositional Phrase Attachment Disambiguation

Authors: Eric Brill, Philip Resnik

Abstract: In this paper, we describe a new corpus-based approach to prepositional phrase attachment disambiguation, and present results comparing performance of this algorithm with other corpus-based approaches to this problem. In this paper, we describe a new corpus-based approach to prepositional phrase attachment disambiguation, and present results comparing performance of this algorithm with other corpus-based approaches to this problem. △ Less

Submitted 25 October, 1994; originally announced October 1994.

Comments: 7 pages, compressed uuencoded postscript

Journal ref: COLING 1994

Showing 1–24 of 24 results for author: Resnik, P