Evaluation of General Large Language Models in Contextually Assessing Semantic Concepts Extracted from Adult Critical Care Electronic Health Record Notes
Authors:
Darren Liu,
Cheng Ding,
Delgersuren Bold,
Monique Bouvier,
Jiaying Lu,
Benjamin Shickel,
Craig S. Jabaley,
Wenhui Zhang,
Soo** Park,
Michael J. Young,
Mark S. Wainwright,
Gilles Clermont,
Parisa Rashidi,
Eric S. Rosenthal,
Laurie Dimisko,
Ran Xiao,
Joo Heung Yoon,
Carl Yang,
Xiao Hu
Abstract:
The field of healthcare has increasingly turned its focus towards Large Language Models (LLMs) due to their remarkable performance. However, their performance in actual clinical applications has been underexplored. Traditional evaluations based on question-answering tasks don't fully capture the nuanced contexts. This gap highlights the need for more in-depth and practical assessments of LLMs in r…
▽ More
The field of healthcare has increasingly turned its focus towards Large Language Models (LLMs) due to their remarkable performance. However, their performance in actual clinical applications has been underexplored. Traditional evaluations based on question-answering tasks don't fully capture the nuanced contexts. This gap highlights the need for more in-depth and practical assessments of LLMs in real-world healthcare settings. Objective: We sought to evaluate the performance of LLMs in the complex clinical context of adult critical care medicine using systematic and comprehensible analytic methods, including clinician annotation and adjudication. Methods: We investigated the performance of three general LLMs in understanding and processing real-world clinical notes. Concepts from 150 clinical notes were identified by MetaMap and then labeled by 9 clinicians. Each LLM's proficiency was evaluated by identifying the temporality and negation of these concepts using different prompts for an in-depth analysis. Results: GPT-4 showed overall superior performance compared to other LLMs. In contrast, both GPT-3.5 and text-davinci-003 exhibit enhanced performance when the appropriate prompting strategies are employed. The GPT family models have demonstrated considerable efficiency, evidenced by their cost-effectiveness and time-saving capabilities. Conclusion: A comprehensive qualitative performance evaluation framework for LLMs is developed and operationalized. This framework goes beyond singular performance aspects. With expert annotations, this methodology not only validates LLMs' capabilities in processing complex medical data but also establishes a benchmark for future LLM evaluations across specialized domains.
△ Less
Submitted 24 January, 2024;
originally announced January 2024.
Simultaneous games with purchase of randomly supplied perfect information: Oracle Games
Authors:
Matthew J. Young,
Andrew Belmonte
Abstract:
We study the role of costly information in non-cooperative two-player games when an extrinsic third party information broker is introduced asymmetrically, allowing one player to obtain information about the other player's action. This broker or "oracle" is defined by a probability of response, supplying correct information randomly; the informed player can pay more for a higher probability of resp…
▽ More
We study the role of costly information in non-cooperative two-player games when an extrinsic third party information broker is introduced asymmetrically, allowing one player to obtain information about the other player's action. This broker or "oracle" is defined by a probability of response, supplying correct information randomly; the informed player can pay more for a higher probability of response. We determine the necessary and sufficient conditions for strategy profiles to be equilibria, in terms of how both players change their strategies in response to the existence of the oracle, as determined by its cost of information function. For mixed strategy equilibria, there is a continuous change as information becomes cheaper, with clear transitions occuring at critical {\it nodes} at which pure strategies become dominated (or undominated). These nodes separate distinct responses to the information for sale, alternating between regions where the paying player increases the amount of information purchased, and regions where the other player moves away from riskier strategies, in favor of safer bets that minimize losses. We derive conditions for these responses by defining a value of information.
△ Less
Submitted 19 February, 2020;
originally announced February 2020.