Search | arXiv e-print repository

arXiv:2402.07023 [pdf, other]

Gemini Goes to Med School: Exploring the Capabilities of Multimodal Large Language Models on Medical Challenge Problems & Hallucinations

Authors: Ankit Pal, Malaikannan Sankarasubbu

Abstract: Large language models have the potential to be valuable in the healthcare industry, but it's crucial to verify their safety and effectiveness through rigorous evaluation. For this purpose, we comprehensively evaluated both open-source LLMs and Google's new multimodal LLM called Gemini across Medical reasoning, hallucination detection, and Medical Visual Question Answering tasks. While Gemini showe… ▽ More Large language models have the potential to be valuable in the healthcare industry, but it's crucial to verify their safety and effectiveness through rigorous evaluation. For this purpose, we comprehensively evaluated both open-source LLMs and Google's new multimodal LLM called Gemini across Medical reasoning, hallucination detection, and Medical Visual Question Answering tasks. While Gemini showed competence, it lagged behind state-of-the-art models like MedPaLM 2 and GPT-4 in diagnostic accuracy. Additionally, Gemini achieved an accuracy of 61.45\% on the medical VQA dataset, significantly lower than GPT-4V's score of 88\%. Our analysis revealed that Gemini is highly susceptible to hallucinations, overconfidence, and knowledge gaps, which indicate risks if deployed uncritically. We also performed a detailed analysis by medical subject and test type, providing actionable feedback for developers and clinicians. To mitigate risks, we applied prompting strategies that improved performance. Additionally, we facilitated future research and development by releasing a Python module for medical LLM evaluation and establishing a dedicated leaderboard on Hugging Face for medical domain LLMs. Python module can be found at https://github.com/promptslab/RosettaEval △ Less

Submitted 10 February, 2024; originally announced February 2024.

Comments: Preprint version, Under Review

arXiv:2307.15343 [pdf, other]

Med-HALT: Medical Domain Hallucination Test for Large Language Models

Authors: Ankit Pal, Logesh Kumar Umapathi, Malaikannan Sankarasubbu

Abstract: This research paper focuses on the challenges posed by hallucinations in large language models (LLMs), particularly in the context of the medical domain. Hallucination, wherein these models generate plausible yet unverified or incorrect information, can have serious consequences in healthcare applications. We propose a new benchmark and dataset, Med-HALT (Medical Domain Hallucination Test), design… ▽ More This research paper focuses on the challenges posed by hallucinations in large language models (LLMs), particularly in the context of the medical domain. Hallucination, wherein these models generate plausible yet unverified or incorrect information, can have serious consequences in healthcare applications. We propose a new benchmark and dataset, Med-HALT (Medical Domain Hallucination Test), designed specifically to evaluate and reduce hallucinations. Med-HALT provides a diverse multinational dataset derived from medical examinations across various countries and includes multiple innovative testing modalities. Med-HALT includes two categories of tests reasoning and memory-based hallucination tests, designed to assess LLMs's problem-solving and information retrieval abilities. Our study evaluated leading LLMs, including Text Davinci, GPT-3.5, LlaMa-2, MPT, and Falcon, revealing significant differences in their performance. The paper provides detailed insights into the dataset, promoting transparency and reproducibility. Through this work, we aim to contribute to the development of safer and more reliable language models in healthcare. Our benchmark can be found at medhalt.github.io △ Less

Submitted 14 October, 2023; v1 submitted 28 July, 2023; originally announced July 2023.

Comments: Accepted at EMNLP 2023(The SIGNLL Conference on Computational Natural Language Learning)

arXiv:2211.07893 [pdf, other]

doi 10.1145/3533708

Federated Learning for Healthcare Domain - Pipeline, Applications and Challenges

Authors: Madhura Joshi, Ankit Pal, Malaikannan Sankarasubbu

Abstract: Federated learning is the process of develo** machine learning models over datasets distributed across data centers such as hospitals, clinical research labs, and mobile devices while preventing data leakage. This survey examines previous research and studies on federated learning in the healthcare sector across a range of use cases and applications. Our survey shows what challenges, methods, an… ▽ More Federated learning is the process of develo** machine learning models over datasets distributed across data centers such as hospitals, clinical research labs, and mobile devices while preventing data leakage. This survey examines previous research and studies on federated learning in the healthcare sector across a range of use cases and applications. Our survey shows what challenges, methods, and applications a practitioner should be aware of in the topic of federated learning. This paper aims to lay out existing research and list the possibilities of federated learning for healthcare industries. △ Less

Submitted 19 November, 2022; v1 submitted 14 November, 2022; originally announced November 2022.

Comments: ACM Transactions on Computing for Healthcare, Vol. 3, No. 4, Article 40. Publication date: October 2022

Journal ref: ACM Transactions on Computing for Healthcare, Vol. 3, No. 4, Article 40. Publication date: October 2022

arXiv:2203.14371 [pdf, other]

MedMCQA : A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering

Authors: Ankit Pal, Logesh Kumar Umapathi, Malaikannan Sankarasubbu

Abstract: This paper introduces MedMCQA, a new large-scale, Multiple-Choice Question Answering (MCQA) dataset designed to address real-world medical entrance exam questions. More than 194k high-quality AIIMS \& NEET PG entrance exam MCQs covering 2.4k healthcare topics and 21 medical subjects are collected with an average token length of 12.77 and high topical diversity. Each sample contains a question, cor… ▽ More This paper introduces MedMCQA, a new large-scale, Multiple-Choice Question Answering (MCQA) dataset designed to address real-world medical entrance exam questions. More than 194k high-quality AIIMS \& NEET PG entrance exam MCQs covering 2.4k healthcare topics and 21 medical subjects are collected with an average token length of 12.77 and high topical diversity. Each sample contains a question, correct answer(s), and other options which requires a deeper language understanding as it tests the 10+ reasoning abilities of a model across a wide range of medical subjects \& topics. A detailed explanation of the solution, along with the above information, is provided in this study. △ Less

Submitted 27 March, 2022; originally announced March 2022.

Comments: Proceedings of Machine Learning Research (PMLR), ACM Conference on Health, Inference, and Learning (CHIL) 2022

Journal ref: ACM Conference on Health, Inference, and Learning (CHIL) 2022

arXiv:2110.06606 [pdf, other]

doi 10.1038/s41598-021-84336-0

Bayesian Optimization of Bose-Einstein Condensates

Authors: Tamil Arasan Bakthavatchalam, Suriyadeepan Ramamoorthy, Malaikannan Sankarasubbu, Radha Ramaswamy, Vijayalakshmi Sethuraman

Abstract: Machine Learning methods are emerging as faster and efficient alternatives to numerical simulation techniques. The field of Scientific Computing has started adopting these data-driven approaches to faithfully model physical phenomena using scattered, noisy observations from coarse-grained grid-based simulations. In this paper, we investigate data-driven modelling of Bose-Einstein Condensates (BECs… ▽ More Machine Learning methods are emerging as faster and efficient alternatives to numerical simulation techniques. The field of Scientific Computing has started adopting these data-driven approaches to faithfully model physical phenomena using scattered, noisy observations from coarse-grained grid-based simulations. In this paper, we investigate data-driven modelling of Bose-Einstein Condensates (BECs). In particular, we use Gaussian Processes (GPs) to model the ground state wave function of BECs as a function of scattering parameters from the dimensionless Gross Pitaveskii Equation (GPE). Experimental results illustrate the ability of GPs to accurately reproduce ground state wave functions using a limited number of data points from simulations. Consistent performance across different configurations of BECs, namely Scalar and Vectorial BECs generated under different potentials, including harmonic, double well and optical lattice potentials pronounces the versatility of our method. Comparison with existing data-driven models indicates that our model achieves similar accuracy with only a small fraction 1/50th of data points used by existing methods, in addition to modelling uncertainty from data. When used as a simulator post-training, our model generates ground state wave functions $36 \times $ faster than Trotter Suzuki, a numerical approximation technique that uses Imaginary time evolution. Our method is quite general; with minor changes it can be applied to similar quantum many-body problems. △ Less

Submitted 13 October, 2021; originally announced October 2021.

Report number: Scientific Reports 11, Article number: 5054 (2021)

Journal ref: Sci Rep 11, 5054 (2021)

arXiv:2109.10847 [pdf, other]

Small-Bench NLP: Benchmark for small single GPU trained models in Natural Language Processing

Authors: Kamal Raj Kanakarajan, Bhuvana Kundumani, Malaikannan Sankarasubbu

Abstract: Recent progress in the Natural Language Processing domain has given us several State-of-the-Art (SOTA) pretrained models which can be finetuned for specific tasks. These large models with billions of parameters trained on numerous GPUs/TPUs over weeks are leading in the benchmark leaderboards. In this paper, we discuss the need for a benchmark for cost and time effective smaller models trained on… ▽ More Recent progress in the Natural Language Processing domain has given us several State-of-the-Art (SOTA) pretrained models which can be finetuned for specific tasks. These large models with billions of parameters trained on numerous GPUs/TPUs over weeks are leading in the benchmark leaderboards. In this paper, we discuss the need for a benchmark for cost and time effective smaller models trained on a single GPU. This will enable researchers with resource constraints experiment with novel and innovative ideas on tokenization, pretraining tasks, architecture, fine tuning methods etc. We set up Small-Bench NLP, a benchmark for small efficient neural language models trained on a single GPU. Small-Bench NLP benchmark comprises of eight NLP tasks on the publicly available GLUE datasets and a leaderboard to track the progress of the community. Our ELECTRA-DeBERTa (15M parameters) small model architecture achieves an average score of 81.53 which is comparable to that of BERT-Base's 82.20 (110M parameters). Our models, code and leaderboard are available at https://github.com/smallbenchnlp △ Less

Submitted 23 September, 2021; v1 submitted 22 September, 2021; originally announced September 2021.

arXiv:2010.02417 [pdf, other]

Pay Attention to the cough: Early Diagnosis of COVID-19 using Interpretable Symptoms Embeddings with Cough Sound Signal Processing

Authors: Ankit Pal, Malaikannan Sankarasubbu

Abstract: COVID-19 (coronavirus disease 2019) pandemic caused by SARS-CoV-2 has led to a treacherous and devastating catastrophe for humanity. At the time of writing, no specific antivirus drugs or vaccines are recommended to control infection transmission and spread. The current diagnosis of COVID-19 is done by Reverse-Transcription Polymer Chain Reaction (RT-PCR) testing. However, this method is expensive… ▽ More COVID-19 (coronavirus disease 2019) pandemic caused by SARS-CoV-2 has led to a treacherous and devastating catastrophe for humanity. At the time of writing, no specific antivirus drugs or vaccines are recommended to control infection transmission and spread. The current diagnosis of COVID-19 is done by Reverse-Transcription Polymer Chain Reaction (RT-PCR) testing. However, this method is expensive, time-consuming, and not easily available in straitened regions. An interpretable and COVID-19 diagnosis AI framework is devised and developed based on the cough sounds features and symptoms metadata to overcome these limitations. The proposed framework's performance was evaluated using a medical dataset containing Symptoms and Demographic data of 30000 audio segments, 328 cough sounds from 150 patients with four cough classes ( COVID-19, Asthma, Bronchitis, and Healthy). Experiments' results show that the model captures the better and robust feature embedding to distinguish between COVID-19 patient coughs and several types of non-COVID-19 coughs with higher specificity and accuracy of 95.04 $\pm$ 0.18% and 96.83$\pm$ 0.18% respectively, all the while maintaining interpretability. △ Less

Submitted 11 October, 2020; v1 submitted 5 October, 2020; originally announced October 2020.

Comments: Preprint Version

arXiv:2003.11644 [pdf, other]

doi 10.5220/0008940304940505

Multi-Label Text Classification using Attention-based Graph Neural Network

Authors: Ankit Pal, Muru Selvakumar, Malaikannan Sankarasubbu

Abstract: In Multi-Label Text Classification (MLTC), one sample can belong to more than one class. It is observed that most MLTC tasks, there are dependencies or correlations among labels. Existing methods tend to ignore the relationship among labels. In this paper, a graph attention network-based model is proposed to capture the attentive dependency structure among the labels. The graph attention network u… ▽ More In Multi-Label Text Classification (MLTC), one sample can belong to more than one class. It is observed that most MLTC tasks, there are dependencies or correlations among labels. Existing methods tend to ignore the relationship among labels. In this paper, a graph attention network-based model is proposed to capture the attentive dependency structure among the labels. The graph attention network uses a feature matrix and a correlation matrix to capture and explore the crucial dependencies between the labels and generate classifiers for the task. The generated classifiers are applied to sentence feature vectors obtained from the text feature extraction network (BiLSTM) to enable end-to-end training. Attention allows the system to assign different weights to neighbor nodes per label, thus allowing it to learn the dependencies among labels implicitly. The results of the proposed model are validated on five real-world MLTC datasets. The proposed model achieves similar or better performance compared to the previous state-of-the-art models. △ Less

Submitted 22 March, 2020; originally announced March 2020.

Journal ref: 12th International Conference on Agents and Artificial Intelligence (ICAART 2020)

arXiv:1909.05624 [pdf, other]

Detecting Parking Spaces in a Parcel using Satellite Images

Authors: Murugesan Vadivel, SelvaKumar Murugan, Suriyadeepan Ramamoorthy, Vaidheeswaran Archana, Malaikannan Sankarasubbu

Abstract: Remote Sensing Images from satellites have been used in various domains for detecting and understanding structures on the ground surface. In this work, satellite images were used for localizing parking spaces and vehicles in parking lots for a given parcel using an RCNN based Neural Network Architectures. Parcel shapefiles and raster images from USGS image archive were used for develo** images f… ▽ More Remote Sensing Images from satellites have been used in various domains for detecting and understanding structures on the ground surface. In this work, satellite images were used for localizing parking spaces and vehicles in parking lots for a given parcel using an RCNN based Neural Network Architectures. Parcel shapefiles and raster images from USGS image archive were used for develo** images for both training and testing. Feature Pyramid based Mask RCNN yields average class accuracy of 97.56% for both parking spaces and vehicles △ Less

Submitted 30 January, 2020; v1 submitted 28 August, 2019; originally announced September 2019.

arXiv:1810.12698 [pdf, other]

Compositional Attention Networks for Interpretability in Natural Language Question Answering

Authors: Muru Selvakumar, Suriyadeepan Ramamoorthy, Vaidheeswaran Archana, Malaikannan Sankarasubbu

Abstract: MAC Net is a compositional attention network designed for Visual Question Answering. We propose a modified MAC net architecture for Natural Language Question Answering. Question Answering typically requires Language Understanding and multi-step Reasoning. MAC net's unique architecture - the separation between memory and control, facilitates data-driven iterative reasoning. This makes it an ideal c… ▽ More MAC Net is a compositional attention network designed for Visual Question Answering. We propose a modified MAC net architecture for Natural Language Question Answering. Question Answering typically requires Language Understanding and multi-step Reasoning. MAC net's unique architecture - the separation between memory and control, facilitates data-driven iterative reasoning. This makes it an ideal candidate for solving tasks that involve logical reasoning. Our experiments with 20 bAbI tasks demonstrate the value of MAC net as a data-efficient and interpretable architecture for Natural Language Question Answering. The transparent nature of MAC net provides a highly granular view of the reasoning steps taken by the network in answering a query. △ Less

Submitted 30 October, 2018; originally announced October 2018.

Comments: 8 pages,10 figures, 1 table

arXiv:1808.01128 [pdf, other]

PHI Scrubber: A Deep Learning Approach

Authors: Abhai Kollara Dilip, Kamal Raj K, Malaikannan Sankarasubbu

Abstract: Confidentiality of patient information is an essential part of Electronic Health Record System. Patient information, if exposed, can cause a serious damage to the privacy of individuals receiving healthcare. Hence it is important to remove such details from physician notes. A system is proposed which consists of a deep learning model where a de-convolutional neural network and bi-directional LSTM-… ▽ More Confidentiality of patient information is an essential part of Electronic Health Record System. Patient information, if exposed, can cause a serious damage to the privacy of individuals receiving healthcare. Hence it is important to remove such details from physician notes. A system is proposed which consists of a deep learning model where a de-convolutional neural network and bi-directional LSTM-CNN is used along with regular expressions to recognize and eliminate the individually identifiable information. This information is then removed from a medical practitioner's data which further allows the fair usage of such information among researchers and in clinical trials. △ Less

Submitted 3 August, 2018; originally announced August 2018.

arXiv:1807.09617 [pdf]

Convolutional Neural Networks In Classifying Cancer Through DNA Methylation

Authors: Soham Chatterjee, Archana Iyer, Satya Avva, Abhai Kollara, Malaikannan Sankarasubbu

Abstract: DNA Methylation has been the most extensively studied epigenetic mark. Usually a change in the genotype, DNA sequence, leads to a change in the phenotype, observable characteristics of the individual. But DNA methylation, which happens in the context of CpG (cytosine and guanine bases linked by phosphate backbone) dinucleotides, does not lead to a change in the original DNA sequence but has the po… ▽ More DNA Methylation has been the most extensively studied epigenetic mark. Usually a change in the genotype, DNA sequence, leads to a change in the phenotype, observable characteristics of the individual. But DNA methylation, which happens in the context of CpG (cytosine and guanine bases linked by phosphate backbone) dinucleotides, does not lead to a change in the original DNA sequence but has the potential to change the phenotype. DNA methylation is implicated in various biological processes and diseases including cancer. Hence there is a strong interest in understanding the DNA methylation patterns across various epigenetic related ailments in order to distinguish and diagnose the type of disease in its early stages. In this work, the relationship between methylated versus unmethylated CpG regions and cancer types is explored using Convolutional Neural Networks (CNNs). A CNN based Deep Learning model that can classify the cancer of a new DNA methylation profile based on the learning from publicly available DNA methylation datasets is then proposed. △ Less

Submitted 24 July, 2018; originally announced July 2018.

Showing 1–12 of 12 results for author: Sankarasubbu, M