Search | arXiv e-print repository

WangLab at MEDIQA-M3G 2024: Multimodal Medical Answer Generation using Large Language Models

Authors: Ronald Xie, Steven Palayew, Augustin Toma, Gary Bader, Bo Wang

Abstract: This paper outlines our submission to the MEDIQA2024 Multilingual and Multimodal Medical Answer Generation (M3G) shared task. We report results for two standalone solutions under the English category of the task, the first involving two consecutive API calls to the Claude 3 Opus API and the second involving training an image-disease label joint embedding in the style of CLIP for image classificati… ▽ More This paper outlines our submission to the MEDIQA2024 Multilingual and Multimodal Medical Answer Generation (M3G) shared task. We report results for two standalone solutions under the English category of the task, the first involving two consecutive API calls to the Claude 3 Opus API and the second involving training an image-disease label joint embedding in the style of CLIP for image classification. These two solutions scored 1st and 2nd place respectively on the competition leaderboard, substantially outperforming the next best solution. Additionally, we discuss insights gained from post-competition experiments. While the performance of these two solutions have significant room for improvement due to the difficulty of the shared task and the challenging nature of medical visual question answering in general, we identify the multi-stage LLM approach and the CLIP image classification approach as promising avenues for further investigation. △ Less

Submitted 22 April, 2024; originally announced April 2024.

arXiv:2404.14544 [pdf, other]

WangLab at MEDIQA-CORR 2024: Optimized LLM-based Programs for Medical Error Detection and Correction

Authors: Augustin Toma, Ronald Xie, Steven Palayew, Patrick R. Lawler, Bo Wang

Abstract: Medical errors in clinical text pose significant risks to patient safety. The MEDIQA-CORR 2024 shared task focuses on detecting and correcting these errors across three subtasks: identifying the presence of an error, extracting the erroneous sentence, and generating a corrected sentence. In this paper, we present our approach that achieved top performance in all three subtasks. For the MS dataset,… ▽ More Medical errors in clinical text pose significant risks to patient safety. The MEDIQA-CORR 2024 shared task focuses on detecting and correcting these errors across three subtasks: identifying the presence of an error, extracting the erroneous sentence, and generating a corrected sentence. In this paper, we present our approach that achieved top performance in all three subtasks. For the MS dataset, which contains subtle errors, we developed a retrieval-based system leveraging external medical question-answering datasets. For the UW dataset, reflecting more realistic clinical notes, we created a pipeline of modules to detect, localize, and correct errors. Both approaches utilized the DSPy framework for optimizing prompts and few-shot examples in large language model (LLM) based programs. Our results demonstrate the effectiveness of LLM based programs for medical error correction. However, our approach has limitations in addressing the full diversity of potential errors in medical documentation. We discuss the implications of our work and highlight future research directions to advance the robustness and applicability of medical error detection and correction systems. △ Less

Submitted 22 April, 2024; originally announced April 2024.

arXiv:2403.12046 [pdf, other]

GPT-4V(ision) Unsuitable for Clinical Care and Education: A Clinician-Evaluated Assessment

Authors: Senthujan Senkaiahliyan, Augustin Toma, Jun Ma, An-Wen Chan, Andrew Ha, Kevin R. An, Hrishikesh Suresh, Barry Rubin, Bo Wang

Abstract: OpenAI's large multimodal model, GPT-4V(ision), was recently developed for general image interpretation. However, less is known about its capabilities with medical image interpretation and diagnosis. Board-certified physicians and senior residents assessed GPT-4V's proficiency across a range of medical conditions using imaging modalities such as CT scans, MRIs, ECGs, and clinical photographs. Alth… ▽ More OpenAI's large multimodal model, GPT-4V(ision), was recently developed for general image interpretation. However, less is known about its capabilities with medical image interpretation and diagnosis. Board-certified physicians and senior residents assessed GPT-4V's proficiency across a range of medical conditions using imaging modalities such as CT scans, MRIs, ECGs, and clinical photographs. Although GPT-4V is able to identify and explain medical images, its diagnostic accuracy and clinical decision-making abilities are poor, posing risks to patient safety. Despite the potential that large language models may have in enhancing medical education and delivery, the current limitations of GPT-4V in interpreting medical images reinforces the importance of appropriate caution when using it for clinical decision-making. △ Less

Submitted 14 November, 2023; originally announced March 2024.

arXiv:2305.12031 [pdf, other]

Clinical Camel: An Open Expert-Level Medical Language Model with Dialogue-Based Knowledge Encoding

Authors: Augustin Toma, Patrick R. Lawler, Jimmy Ba, Rahul G. Krishnan, Barry B. Rubin, Bo Wang

Abstract: We present Clinical Camel, an open large language model (LLM) explicitly tailored for clinical research. Fine-tuned from LLaMA-2 using QLoRA, Clinical Camel achieves state-of-the-art performance across medical benchmarks among openly available medical LLMs. Leveraging efficient single-GPU training, Clinical Camel surpasses GPT-3.5 in five-shot evaluations on all assessed benchmarks, including 64.3… ▽ More We present Clinical Camel, an open large language model (LLM) explicitly tailored for clinical research. Fine-tuned from LLaMA-2 using QLoRA, Clinical Camel achieves state-of-the-art performance across medical benchmarks among openly available medical LLMs. Leveraging efficient single-GPU training, Clinical Camel surpasses GPT-3.5 in five-shot evaluations on all assessed benchmarks, including 64.3% on the USMLE Sample Exam (compared to 58.5% for GPT-3.5), 77.9% on PubMedQA (compared to 60.2%), 60.7% on MedQA (compared to 53.6%), and 54.2% on MedMCQA (compared to 51.0%). In addition to these benchmarks, Clinical Camel demonstrates its broader capabilities, such as synthesizing plausible clinical notes. This work introduces dialogue-based knowledge encoding, a novel method to synthesize conversational data from dense medical texts. While benchmark results are encouraging, extensive and rigorous human evaluation across diverse clinical scenarios is imperative to ascertain safety before implementation. By openly sharing Clinical Camel, we hope to foster transparent and collaborative research, working towards the safe integration of LLMs within the healthcare domain. Significant challenges concerning reliability, bias, and the potential for outdated knowledge persist. Nonetheless, the transparency provided by an open approach reinforces the scientific rigor essential for future clinical applications. △ Less

Submitted 17 August, 2023; v1 submitted 19 May, 2023; originally announced May 2023.

Comments: for model weights, see https://huggingface.co/wanglab/

arXiv:2305.02220 [pdf, other]

WangLab at MEDIQA-Chat 2023: Clinical Note Generation from Doctor-Patient Conversations using Large Language Models

Authors: John Giorgi, Augustin Toma, Ronald Xie, Sondra S. Chen, Kevin R. An, Grace X. Zheng, Bo Wang

Abstract: This paper describes our submission to the MEDIQA-Chat 2023 shared task for automatic clinical note generation from doctor-patient conversations. We report results for two approaches: the first fine-tunes a pre-trained language model (PLM) on the shared task data, and the second uses few-shot in-context learning (ICL) with a large language model (LLM). Both achieve high performance as measured by… ▽ More This paper describes our submission to the MEDIQA-Chat 2023 shared task for automatic clinical note generation from doctor-patient conversations. We report results for two approaches: the first fine-tunes a pre-trained language model (PLM) on the shared task data, and the second uses few-shot in-context learning (ICL) with a large language model (LLM). Both achieve high performance as measured by automatic metrics (e.g. ROUGE, BERTScore) and ranked second and first, respectively, of all submissions to the shared task. Expert human scrutiny indicates that notes generated via the ICL-based approach with GPT-4 are preferred about as often as human-written notes, making it a promising path toward automated note generation from doctor-patient conversations. △ Less

Submitted 3 June, 2023; v1 submitted 3 May, 2023; originally announced May 2023.

Comments: Camera-ready submission to ClinicalNLP @ ACL 2023

arXiv:2302.08237 [pdf, other]

doi 10.1109/ACCESS.2023.3273770

A Cloud-based Deep Learning Framework for Early Detection of Pushing at Crowded Event Entrances

Authors: Ahmed Alia, Mohammed Maree, Mohcine Chraibi, Anas Toma, Armin Seyfried

Abstract: Crowding at the entrances of large events may lead to critical and life-threatening situations, particularly when people start pushing each other to reach the event faster. Automatic and timely identification of pushing behavior would help organizers and security forces to intervene early and mitigate dangerous situations. In this paper, we propose a cloud-based deep learning framework for automat… ▽ More Crowding at the entrances of large events may lead to critical and life-threatening situations, particularly when people start pushing each other to reach the event faster. Automatic and timely identification of pushing behavior would help organizers and security forces to intervene early and mitigate dangerous situations. In this paper, we propose a cloud-based deep learning framework for automatic early detection of pushing in crowded event entrances. The proposed framework initially modifies and trains the EfficientNetV2B0 Convolutional Neural Network model. Subsequently, it integrates the adapted model with an accurate and fast pre-trained deep optical flow model with the color wheel method to analyze video streams and identify pushing patches in real-time. Moreover, the framework uses live capturing technology and a cloud-based environment to collect video streams of crowds in real-time and provide early-stage results. A novel dataset is generated based on five real-world experiments and their associated ground truth data to train the adapted EfficientNetV2B0 model. The experimental setups simulated a crowded event entrance, while the ground truths for each video experiment was generated manually by social psychologists. Several experiments on the videos and the generated dataset are carried out to evaluate the accuracy and annotation delay time of the proposed framework. The experimental results show that the proposed framework identified pushing behaviors with an accuracy rate of 87% within a reasonable delay time. △ Less

Submitted 9 June, 2023; v1 submitted 16 February, 2023; originally announced February 2023.

Journal ref: 2023, IEEE Access

arXiv:2203.03092 [pdf, other]

Systematic Comparison of Path Planning Algorithms using PathBench

Authors: Hao-Ya Hsueh, Alexandru-Iosif Toma, Hussein Ali Jaafar, Edward Stow, Riku Murai, Paul H. J. Kelly, Sajad Saeedi

Abstract: Path planning is an essential component of mobile robotics. Classical path planning algorithms, such as wavefront and rapidly-exploring random tree (RRT) are used heavily in autonomous robots. With the recent advances in machine learning, development of learning-based path planning algorithms has been experiencing rapid growth. An unified path planning interface that facilitates the development an… ▽ More Path planning is an essential component of mobile robotics. Classical path planning algorithms, such as wavefront and rapidly-exploring random tree (RRT) are used heavily in autonomous robots. With the recent advances in machine learning, development of learning-based path planning algorithms has been experiencing rapid growth. An unified path planning interface that facilitates the development and benchmarking of existing and new algorithms is needed. This paper presents PathBench, a platform for develo**, visualizing, training, testing, and benchmarking of existing and future, classical and learning-based path planning algorithms in 2D and 3D grid world environments. Many existing path planning algorithms are supported; e.g. A*, Dijkstra, waypoint planning networks, value iteration networks, gated path planning networks; and integrating new algorithms is easy and clearly specified. The benchmarking ability of PathBench is explored in this paper by comparing algorithms across five different hardware systems and three different map types, including built-in PathBench maps, video game maps, and maps from real world databases. Metrics, such as path length, success rate, and computational time, were used to evaluate algorithms. Algorithmic analysis was also performed on a real world robot to demonstrate PathBench's support for Robot Operating System (ROS). PathBench is open source. △ Less

Submitted 6 March, 2022; originally announced March 2022.

Comments: Accepted to Advanced Robotics Journal; 23 pages, 9 figures, 4 tables. arXiv admin note: substantial text overlap with arXiv:2105.01777

arXiv:2105.01777 [pdf, other]

PathBench: A Benchmarking Platform for Classical and Learned Path Planning Algorithms

Authors: Alexandru-Iosif Toma, Hao-Ya Hsueh, Hussein Ali Jaafar, Riku Murai, Paul H. J. Kelly, Sajad Saeedi

Abstract: Path planning is a key component in mobile robotics. A wide range of path planning algorithms exist, but few attempts have been made to benchmark the algorithms holistically or unify their interface. Moreover, with the recent advances in deep neural networks, there is an urgent need to facilitate the development and benchmarking of such learning-based planning algorithms. This paper presents PathB… ▽ More Path planning is a key component in mobile robotics. A wide range of path planning algorithms exist, but few attempts have been made to benchmark the algorithms holistically or unify their interface. Moreover, with the recent advances in deep neural networks, there is an urgent need to facilitate the development and benchmarking of such learning-based planning algorithms. This paper presents PathBench, a platform for develo**, visualizing, training, testing, and benchmarking of existing and future, classical and learned 2D and 3D path planning algorithms, while offering support for Robot Oper-ating System (ROS). Many existing path planning algorithms are supported; e.g. A*, wavefront, rapidly-exploring random tree, value iteration networks, gated path planning networks; and integrating new algorithms is easy and clearly specified. We demonstrate the benchmarking capability of PathBench by comparing implemented classical and learned algorithms for metrics, such as path length, success rate, computational time and path deviation. These evaluations are done on built-in PathBench maps and external path planning environments from video games and real world databases. PathBench is open source. △ Less

Submitted 4 May, 2021; originally announced May 2021.

Comments: The Conference on Robots and Vision (CRV2021), Supplementary Website: https://sites.google.com/view/pathbench/

arXiv:2105.00312 [pdf, other]

Waypoint Planning Networks

Authors: Alexandru-Iosif Toma, Hussein Ali Jaafar, Hao-Ya Hsueh, Stephen James, Daniel Lenton, Ronald Clark, Sajad Saeedi

Abstract: With the recent advances in machine learning, path planning algorithms are also evolving; however, the learned path planning algorithms often have difficulty competing with success rates of classic algorithms. We propose waypoint planning networks (WPN), a hybrid algorithm based on LSTMs with a local kernel - a classic algorithm such as A*, and a global kernel using a learned algorithm. WPN produc… ▽ More With the recent advances in machine learning, path planning algorithms are also evolving; however, the learned path planning algorithms often have difficulty competing with success rates of classic algorithms. We propose waypoint planning networks (WPN), a hybrid algorithm based on LSTMs with a local kernel - a classic algorithm such as A*, and a global kernel using a learned algorithm. WPN produces a more computationally efficient and robust solution. We compare WPN against A*, as well as related works including motion planning networks (MPNet) and value iteration networks (VIN). In this paper, the design and experiments have been conducted for 2D environments. Experimental results outline the benefits of WPN, both in efficiency and generalization. It is shown that WPN's search space is considerably less than A*, while being able to generate near optimal results. Additionally, WPN works on partial maps, unlike A* which needs the full map in advance. The code is available online. △ Less

Submitted 1 May, 2021; originally announced May 2021.

Comments: The Conference on Robots and Vision (CRV2021) Supplementary Website: https://sites.google.com/view/waypoint-planning-networks

arXiv:1407.8056 [pdf, other]

doi 10.1016/j.comnet.2015.06.003

Emitter Localisation from Reception Timestamps in Asynchronous Networks

Authors: Nicolo' Facchi, Francesco Gringoli, Fabio Ricciato, Andrea Toma

Abstract: We address the problem of localising a mobile terminal ("blind" node) in unknown position from a set of "anchor" nodes in known positions. The proposed method does not require any form of node synchronisation nor measurement (or control) of the transmission times, which is difficult or anyway costly to achieve in practice. It relies exclusively on reception timestamps collected by the anchor nodes… ▽ More We address the problem of localising a mobile terminal ("blind" node) in unknown position from a set of "anchor" nodes in known positions. The proposed method does not require any form of node synchronisation nor measurement (or control) of the transmission times, which is difficult or anyway costly to achieve in practice. It relies exclusively on reception timestamps collected by the anchor nodes, according to their local clocks, that overhear packets transmitted by the blind node and by (at least one) other transmitting node(s) in known position, e.g., other anchors. The clock differences between the nodes are not eliminated ex ante through clock synchronisation, as in traditional ToA and TDoA methods. Instead, they are counteracted ex post, during the data processing stage, leveraging the data redundancy that is intrinsic to the multiple reception of the same packet by different (anchor) nodes. We validate the proposed method in different experimental settings, indoor and outdoor, using exclusively low-cost Commercial-Off-The-Shelf WiFi devices, achieving sub-meter accuracy in full Line-of-Sight conditions and meter-level accuracy in mild Non-line-of-sight environment. The proposed method does not require that the blind node participate actively to the localisation procedure and can use "opportunistically" any legacy signal or packet available over-the-air for communication purposes. Considering the very minimal requirement on the system - basically, only that anchors in known positions are able to collect and share reception timestamps - the proposed approach can enable practical adoption of opportunistic and/or cooperative localisation on top of existing radio communication systems. △ Less

Submitted 14 July, 2015; v1 submitted 30 July, 2014; originally announced July 2014.

Comments: To appear in Computer Networks

Showing 1–10 of 10 results for author: Toma, A