Search | arXiv e-print repository

Robustness of Structured Data Extraction from In-plane Rotated Documents using Multi-Modal Large Language Models (LLM)

Authors: Anjanava Biswas, Wrick Talukdar

Abstract: Multi-modal large language models (LLMs) have shown remarkable performance in various natural language processing tasks, including data extraction from documents. However, the accuracy of these models can be significantly affected by document in-plane rotation, also known as skew, a common issue in real-world scenarios for scanned documents. This study investigates the impact of document skew on t… ▽ More Multi-modal large language models (LLMs) have shown remarkable performance in various natural language processing tasks, including data extraction from documents. However, the accuracy of these models can be significantly affected by document in-plane rotation, also known as skew, a common issue in real-world scenarios for scanned documents. This study investigates the impact of document skew on the data extraction accuracy of three state-of-the-art multi-modal LLMs: Anthropic Claude V3 Sonnet, GPT-4-Turbo, and Llava:v1.6. We focus on extracting specific entities from synthetically generated sample documents with varying degrees of skewness. The results demonstrate that document skew adversely affects the data extraction accuracy of all the tested LLMs, with the severity of the impact varying across models. We identify the safe in-plane rotation angles (SIPRA) for each model and investigate the effects of skew on model hallucinations. Furthermore, we explore existing skew detection and correction mechanisms and discuss their potential limitations. We propose alternative approaches, including develo** new multi-modal architectures that are inherently more robust to document skew and incorporating skewing techniques during the pre-training phase of the models. Additionally, we highlight the need for more comprehensive testing on a wider range of document quality and conditions to fully understand the challenges and opportunities associated with using multi-modal LLMs for information extraction in real-world scenarios. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: 20 pages, 6 figures

Journal ref: Journal of Artificial Intelligence Research: Vol. 4 (2024): No. 1, 176-195

arXiv:2406.06569 [pdf]

doi 10.38124/ijisrt/IJISRT24MAY2085

Enhancing Clinical Documentation with Synthetic Data: Leveraging Generative Models for Improved Accuracy

Authors: Anjanava Biswas, Wrick Talukdar

Abstract: Accurate and comprehensive clinical documentation is crucial for delivering high-quality healthcare, facilitating effective communication among providers, and ensuring compliance with regulatory requirements. However, manual transcription and data entry processes can be time-consuming, error-prone, and susceptible to inconsistencies, leading to incomplete or inaccurate medical records. This paper… ▽ More Accurate and comprehensive clinical documentation is crucial for delivering high-quality healthcare, facilitating effective communication among providers, and ensuring compliance with regulatory requirements. However, manual transcription and data entry processes can be time-consuming, error-prone, and susceptible to inconsistencies, leading to incomplete or inaccurate medical records. This paper proposes a novel approach to augment clinical documentation by leveraging synthetic data generation techniques to generate realistic and diverse clinical transcripts. We present a methodology that combines state-of-the-art generative models, such as Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs), with real-world clinical transcript and other forms of clinical data to generate synthetic transcripts. These synthetic transcripts can then be used to supplement existing documentation workflows, providing additional training data for natural language processing models and enabling more accurate and efficient transcription processes. Through extensive experiments on a large dataset of anonymized clinical transcripts, we demonstrate the effectiveness of our approach in generating high-quality synthetic transcripts that closely resemble real-world data. Quantitative evaluation metrics, including perplexity scores and BLEU scores, as well as qualitative assessments by domain experts, validate the fidelity and utility of the generated synthetic transcripts. Our findings highlight synthetic data generation's potential to address clinical documentation challenges, improving patient care, reducing administrative burdens, and enhancing healthcare system efficiency. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Journal ref: International Journal of Innovative Science and Research Technology: Vol. 9 (2024): No. 5, 1553-1566

arXiv:2406.01618 [pdf]

doi 10.56726/IRJMETS57269

FinEmbedDiff: A Cost-Effective Approach of Classifying Financial Documents with Vector Sampling using Multi-modal Embedding Models

Authors: Anjanava Biswas, Wrick Talukdar

Abstract: Accurate classification of multi-modal financial documents, containing text, tables, charts, and images, is crucial but challenging. Traditional text-based approaches often fail to capture the complex multi-modal nature of these documents. We propose FinEmbedDiff, a cost-effective vector sampling method that leverages pre-trained multi-modal embedding models to classify financial documents. Our ap… ▽ More Accurate classification of multi-modal financial documents, containing text, tables, charts, and images, is crucial but challenging. Traditional text-based approaches often fail to capture the complex multi-modal nature of these documents. We propose FinEmbedDiff, a cost-effective vector sampling method that leverages pre-trained multi-modal embedding models to classify financial documents. Our approach generates multi-modal embedding vectors for documents, and compares new documents with pre-computed class embeddings using vector similarity measures. Evaluated on a large dataset, FinEmbedDiff achieves competitive classification accuracy compared to state-of-the-art baselines while significantly reducing computational costs. The method exhibits strong generalization capabilities, making it a practical and scalable solution for real-world financial applications. △ Less

Submitted 28 May, 2024; originally announced June 2024.

Comments: 10 pages, 3 figures

Journal ref: International Research Journal of Modernization in Engineering Technology and Science: Vol. 06 (2024): No. 5, 6142-6152

arXiv:2406.01096 [pdf]

doi 10.38124/ijisrt/IJISRT24MAY2087

Synergizing Unsupervised and Supervised Learning: A Hybrid Approach for Accurate Natural Language Task Modeling

Authors: Wrick Talukdar, Anjanava Biswas

Abstract: While supervised learning models have shown remarkable performance in various natural language processing (NLP) tasks, their success heavily relies on the availability of large-scale labeled datasets, which can be costly and time-consuming to obtain. Conversely, unsupervised learning techniques can leverage abundant unlabeled text data to learn rich representations, but they do not directly optimi… ▽ More While supervised learning models have shown remarkable performance in various natural language processing (NLP) tasks, their success heavily relies on the availability of large-scale labeled datasets, which can be costly and time-consuming to obtain. Conversely, unsupervised learning techniques can leverage abundant unlabeled text data to learn rich representations, but they do not directly optimize for specific NLP tasks. This paper presents a novel hybrid approach that synergizes unsupervised and supervised learning to improve the accuracy of NLP task modeling. While supervised models excel at specific tasks, they rely on large labeled datasets. Unsupervised techniques can learn rich representations from abundant unlabeled text but don't directly optimize for tasks. Our methodology integrates an unsupervised module that learns representations from unlabeled corpora (e.g., language models, word embeddings) and a supervised module that leverages these representations to enhance task-specific models. We evaluate our approach on text classification and named entity recognition (NER), demonstrating consistent performance gains over supervised baselines. For text classification, contextual word embeddings from a language model pretrain a recurrent or transformer-based classifier. For NER, word embeddings initialize a BiLSTM sequence labeler. By synergizing techniques, our hybrid approach achieves SOTA results on benchmark datasets, paving the way for more data-efficient and robust NLP systems. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Journal ref: International Journal of Innovative Science and Research Technology: Vol. 9 (2024): No. 5, 1499-1508

arXiv:2405.18346 [pdf]

doi 10.38124/ijisrt/IJISRT24MAY1483

Intelligent Clinical Documentation: Harnessing Generative AI for Patient-Centric Clinical Note Generation

Authors: Anjanava Biswas, Wrick Talukdar

Abstract: Comprehensive clinical documentation is crucial for effective healthcare delivery, yet it poses a significant burden on healthcare professionals, leading to burnout, increased medical errors, and compromised patient safety. This paper explores the potential of generative AI (Artificial Intelligence) to streamline the clinical documentation process, specifically focusing on generating SOAP (Subject… ▽ More Comprehensive clinical documentation is crucial for effective healthcare delivery, yet it poses a significant burden on healthcare professionals, leading to burnout, increased medical errors, and compromised patient safety. This paper explores the potential of generative AI (Artificial Intelligence) to streamline the clinical documentation process, specifically focusing on generating SOAP (Subjective, Objective, Assessment, Plan) and BIRP (Behavior, Intervention, Response, Plan) notes. We present a case study demonstrating the application of natural language processing (NLP) and automatic speech recognition (ASR) technologies to transcribe patient-clinician interactions, coupled with advanced prompting techniques to generate draft clinical notes using large language models (LLMs). The study highlights the benefits of this approach, including time savings, improved documentation quality, and enhanced patient-centered care. Additionally, we discuss ethical considerations, such as maintaining patient confidentiality and addressing model biases, underscoring the need for responsible deployment of generative AI in healthcare settings. The findings suggest that generative AI has the potential to revolutionize clinical documentation practices, alleviating administrative burdens and enabling healthcare professionals to focus more on direct patient care. △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: 15 pages, 7 figures

Journal ref: International Journal of Innovative Science and Research Technology: Vol. 9 (2024): No. 5, 994-1008

Showing 1–5 of 5 results for author: Talukdar, W