Search | arXiv e-print repository

Answering real-world clinical questions using large language model based systems

Authors: Yen Sia Low, Michael L. Jackson, Rebecca J. Hyde, Robert E. Brown, Neil M. Sanghavi, Julian D. Baldwin, C. William Pike, Jananee Muralidharan, Gavin Hui, Natasha Alexander, Hadeel Hassan, Rahul V. Nene, Morgan Pike, Courtney J. Pokrzywa, Shivam Vedak, Adam Paul Yan, Dong-han Yao, Amy R. Zipursky, Christina Dinh, Philip Ballentine, Dan C. Derieg, Vladimir Polony, Rehan N. Chawdry, Jordan Davies, Brigham B. Hyde , et al. (2 additional authors not shown)

Abstract: Evidence to guide healthcare decisions is often limited by a lack of relevant and trustworthy literature as well as difficulty in contextualizing existing research for a specific patient. Large language models (LLMs) could potentially address both challenges by either summarizing published literature or generating new studies based on real-world data (RWD). We evaluated the ability of five LLM-bas… ▽ More Evidence to guide healthcare decisions is often limited by a lack of relevant and trustworthy literature as well as difficulty in contextualizing existing research for a specific patient. Large language models (LLMs) could potentially address both challenges by either summarizing published literature or generating new studies based on real-world data (RWD). We evaluated the ability of five LLM-based systems in answering 50 clinical questions and had nine independent physicians review the responses for relevance, reliability, and actionability. As it stands, general-purpose LLMs (ChatGPT-4, Claude 3 Opus, Gemini Pro 1.5) rarely produced answers that were deemed relevant and evidence-based (2% - 10%). In contrast, retrieval augmented generation (RAG)-based and agentic LLM systems produced relevant and evidence-based answers for 24% (OpenEvidence) to 58% (ChatRWD) of questions. Only the agentic ChatRWD was able to answer novel questions compared to other LLMs (65% vs. 0-9%). These results suggest that while general-purpose LLMs should not be used as-is, a purpose-built system for evidence summarization based on RAG and one for generating novel evidence working synergistically would improve availability of pertinent evidence for patient care. △ Less

Submitted 29 June, 2024; originally announced July 2024.

Comments: 28 pages (2 figures, 3 tables) inclusive of 8 pages of supplemental materials (4 supplemental figures and 4 supplemental tables)

arXiv:2403.06955 [pdf, other]

Accurate Crystal Structure Prediction of New 2D Hybrid Organic Inorganic Perovskites

Authors: Nima Karimitari, William J. Baldwin, Evan W. Muller, Zachary J. L. Bare, W. Joshua Kennedy, Gábor Csányi, Christopher Sutton

Abstract: Low dimensional hybrid organic-inorganic perovskites (HOIPs) represent a promising class of electronically active materials for both light absorption and emission. The design space of HOIPs is extremely large, since a diverse space of organic cations can be combined with different inorganic frameworks. This immense design space allows for tunable electronic and mechanical properties, but also nece… ▽ More Low dimensional hybrid organic-inorganic perovskites (HOIPs) represent a promising class of electronically active materials for both light absorption and emission. The design space of HOIPs is extremely large, since a diverse space of organic cations can be combined with different inorganic frameworks. This immense design space allows for tunable electronic and mechanical properties, but also necessitates the development of new tools for in silico high throughput analysis of candidate structures. In this work, we present an accurate, efficient, transferable and widely applicable machine learning interatomic potential (MLIP) for predicting the structure of new 2D HOIPs. Using the MACE architecture, an MLIP is trained on 86 diverse experimentally reported HOIP structures. The model is tested on 73 unseen perovskite compositions, and achieves chemical accuracy with respect to the reference electronic structure method. Our model is then combined with a simple random structure search algorithm to predict the structure of hypothetical HOIPs given only the proposed composition. Success is demonstrated by correctly and reliably recovering the crystal structure of a set of experimentally known 2D perovskites. Such a random structure search is impossible with ab initio methods due to the associated computational cost, but is relatively inexpensive with the MACE potential. Finally, the procedure is used to predict the structure formed by a new organic cation with no previously known corresponding perovskite. Laboratory synthesis of the new hybrid perovskite confirms the accuracy of our prediction. This capability, applied at scale, enables efficient screening of thousands of combinations of organic cations and inorganic layers. △ Less

Submitted 11 March, 2024; originally announced March 2024.

Comments: 14 pages and 9 figures in the main text. Supplementary included in pdf

arXiv:1912.02736 [pdf, other]

Insights from BB-MAS -- A Large Dataset for Ty**, Gait and Swipes of the Same Person on Desktop, Tablet and Phone

Authors: Amith K. Belman, Li Wang, S. S. Iyengar, Pawel Sniatala, Robert Wright, Robert Dora, Jacob Baldwin, Zhanpeng **, Vir V. Phoha

Abstract: Behavioral biometrics are key components in the landscape of research in continuous and active user authentication. However, there is a lack of large datasets with multiple activities, such as ty**, gait and swipe performed by the same person. Furthermore, large datasets with multiple activities performed on multiple devices by the same person are non-existent. The difficulties of procuring devi… ▽ More Behavioral biometrics are key components in the landscape of research in continuous and active user authentication. However, there is a lack of large datasets with multiple activities, such as ty**, gait and swipe performed by the same person. Furthermore, large datasets with multiple activities performed on multiple devices by the same person are non-existent. The difficulties of procuring devices, participants, designing protocol, secure storage and on-field hindrances may have contributed to this scarcity. The availability of such a dataset is crucial to forward the research in behavioral biometrics as usage of multiple devices by a person is common nowadays. Through this paper, we share our dataset, the details of its collection, features for each modality and our findings of how keystroke features vary across devices. We have collected data from 117 subjects for ty** (both fixed and free text), gait (walking, upstairs and downstairs) and touch on Desktop, Tablet and Phone. The dataset consists a total of about: 3.5 million keystroke events; 57.1 million data-points for accelerometer and gyroscope each; 1.7 million data-points for swipes; and enables future research to explore previously unexplored directions in inter-device and inter-modality biometrics. Our analysis on keystrokes reveals that in most cases, keyhold times are smaller but inter-key latencies are larger, on hand-held devices when compared to desktop. We also present; detailed comparison with related datasets; possible research directions with the dataset; and lessons learnt from the data collection. △ Less

Submitted 19 December, 2019; v1 submitted 8 November, 2019; originally announced December 2019.

arXiv:1807.10442 [pdf]

doi 10.1007/978-3-319-73951-9_6

Leveraging Support Vector Machine for Opcode Density Based Detection of Crypto-Ransomware

Authors: James Baldwin, Ali Dehghantanha

Abstract: Ransomware is a significant global threat, with easy deployment due to the prevalent ransomware-as-a-service model. Machine learning algorithms incorporating the use of opcode characteristics and Support Vector Machine have been demonstrated to be a successful method for general malware detection. This research focuses on crypto-ransomware and uses static analysis of malicious and benign Portable… ▽ More Ransomware is a significant global threat, with easy deployment due to the prevalent ransomware-as-a-service model. Machine learning algorithms incorporating the use of opcode characteristics and Support Vector Machine have been demonstrated to be a successful method for general malware detection. This research focuses on crypto-ransomware and uses static analysis of malicious and benign Portable Executable files to extract 443 opcodes across all samples, representing them as density histograms within the dataset. Using the SMO classifier and PUK kernel in the WEKA machine learning toolset it demonstrates that this methodology can achieve 100% precision when differentiating between ransomware and goodware, and 96.5% when differentiating between 5 cryptoransomware families and goodware. Moreover, 8 different attribute selection methods are evaluated to achieve significant feature reduction. Using the CorrelationAttributeEval method close to 100% precision can be maintained with a feature reduction of 59.5%. The CFSSubset filter achieves the highest feature reduction of 97.7% however with a slightly lower precision at 94.2%. △ Less

Submitted 27 July, 2018; originally announced July 2018.

Comments: 28 Pages

arXiv:1807.10440 [pdf]

doi 10.1007/978-3-319-73951-9_5

Leveraging Machine Learning Techniques for Windows Ransomware Network Traffic Detection

Authors: Omar M. K. Alhawi, James Baldwin, Ali Dehghantanha

Abstract: Ransomware has become a significant global threat with the ransomware-as-a-service model enabling easy availability and deployment, and the potential for high revenues creating a viable criminal business model. Individuals, private companies or public service providers e.g. healthcare or utilities companies can all become victims of ransomware attacks and consequently suffer severe disruption and… ▽ More Ransomware has become a significant global threat with the ransomware-as-a-service model enabling easy availability and deployment, and the potential for high revenues creating a viable criminal business model. Individuals, private companies or public service providers e.g. healthcare or utilities companies can all become victims of ransomware attacks and consequently suffer severe disruption and financial loss. Although machine learning algorithms are already being used to detect ransomware, variants are being developed to specifically evade detection when using dynamic machine learning techniques. In this paper, we introduce NetConverse, a machine learning analysis of Windows ransomware network traffic to achieve a high, consistent detection rate. Using a dataset created from conversation-based network traffic features we achieved a true positive detection rate of 97.1% using the Decision Tree (J48) classifier. △ Less

Submitted 27 July, 2018; originally announced July 2018.

Comments: 11 Pages

arXiv:1807.10436 [pdf]

doi 10.1007/978-3-319-73951-9_16

Emerging from The Cloud: A Bibliometric Analysis of Cloud Forensics Studies

Authors: James Baldwin, Omar M. K. Alhawi, Simone Shaughnessy, Alex Akinbi, Ali Dehghantanha

Abstract: The emergence of cloud computing technologies has changed the way we store, retrieve, and archive our data. With the promise of unlimited, reliable and always-available storage, a lot of private and confidential data are now stored on different cloud platforms. Being such a gold mine of data, cloud platforms are among the most valuable targets for attackers. Therefore, many forensics investigators… ▽ More The emergence of cloud computing technologies has changed the way we store, retrieve, and archive our data. With the promise of unlimited, reliable and always-available storage, a lot of private and confidential data are now stored on different cloud platforms. Being such a gold mine of data, cloud platforms are among the most valuable targets for attackers. Therefore, many forensics investigators have tried to develop tools, tactics and procedures to collect, preserve, analyse and report evidences of attackers activities on different cloud platforms. Despite the number of published articles there is not a bibliometric study that presents cloud forensics research trends. This paper aims to address this problem by providing a comprehensive assessment of cloud forensics research trends between 2009 and 2016. Moreover, we provide a classification of cloud forensics process to detect the most profound research areas and highlight remaining challenges. △ Less

Submitted 27 July, 2018; originally announced July 2018.

Comments: 22 Pages

arXiv:math/9801152 [pdf, ps, other]

On the classifiability of cellular automata

Authors: John T. Baldwin, Saharon Shelah

Abstract: Based on computer simulations Wolfram presented in several papers conjectured classifications of cellular automata into 4 types. He distinguishes the 4 classes of cellular automata by the evolution of the pattern generated by applying a cellular automaton to a finite input. Wolfram's qualitative classification is based on the examination of a large number of simulations. In addition to this clas… ▽ More Based on computer simulations Wolfram presented in several papers conjectured classifications of cellular automata into 4 types. He distinguishes the 4 classes of cellular automata by the evolution of the pattern generated by applying a cellular automaton to a finite input. Wolfram's qualitative classification is based on the examination of a large number of simulations. In addition to this classification based on the rate of growth, he conjectured a similar classification according to the eventual pattern. We consider here one formalization of his rate of growth suggestion. After completing our major results (based only on Wolfram's work), we investigated other contributions to the area and we report the relation of some of them to our discoveries. △ Less

Submitted 14 January, 1998; originally announced January 1998.

Report number: Shelah [BlSh:623]

Showing 1–7 of 7 results for author: Baldwin, J