-
D-Rax: Domain-specific Radiologic assistant leveraging multi-modal data and eXpert model predictions
Authors:
Hareem Nisar,
Syed Muhammad Anwar,
Zhifan Jiang,
Abhijeet Parida,
Vishwesh Nath,
Holger R. Roth,
Marius George Linguraru
Abstract:
Large vision language models (VLMs) have progressed incredibly from research to applicability for general-purpose use cases. LLaVA-Med, a pioneering large language and vision assistant for biomedicine, can perform multi-modal biomedical image and data analysis to provide a natural language interface for radiologists. While it is highly generalizable and works with multi-modal data, it is currently…
▽ More
Large vision language models (VLMs) have progressed incredibly from research to applicability for general-purpose use cases. LLaVA-Med, a pioneering large language and vision assistant for biomedicine, can perform multi-modal biomedical image and data analysis to provide a natural language interface for radiologists. While it is highly generalizable and works with multi-modal data, it is currently limited by well-known challenges that exist in the large language model space. Hallucinations and imprecision in responses can lead to misdiagnosis which currently hinder the clinical adaptability of VLMs. To create precise, user-friendly models in healthcare, we propose D-Rax -- a domain-specific, conversational, radiologic assistance tool that can be used to gain insights about a particular radiologic image. In this study, we enhance the conversational analysis of chest X-ray (CXR) images to support radiological reporting, offering comprehensive insights from medical imaging and aiding in the formulation of accurate diagnosis. D-Rax is achieved by fine-tuning the LLaVA-Med architecture on our curated enhanced instruction-following data, comprising of images, instructions, as well as disease diagnosis and demographic predictions derived from MIMIC-CXR imaging data, CXR-related visual question answer (VQA) pairs, and predictive outcomes from multiple expert AI models. We observe statistically significant improvement in responses when evaluated for both open and close-ended conversations. Leveraging the power of state-of-the-art diagnostic models combined with VLMs, D-Rax empowers clinicians to interact with medical images using natural language, which could potentially streamline their decision-making process, enhance diagnostic accuracy, and conserve their time.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
AR3n: A Reinforcement Learning-based Assist-As-Needed Controller for Robotic Rehabilitation
Authors:
Shrey Pareek,
Harris Nisar,
Thenkurussi Kesavadas
Abstract:
In this paper, we present AR3n (pronounced as Aaron), an assist-as-needed (AAN) controller that utilizes reinforcement learning to supply adaptive assistance during a robot assisted handwriting rehabilitation task. Unlike previous AAN controllers, our method does not rely on patient specific controller parameters or physical models. We propose the use of a virtual patient model to generalize AR3n…
▽ More
In this paper, we present AR3n (pronounced as Aaron), an assist-as-needed (AAN) controller that utilizes reinforcement learning to supply adaptive assistance during a robot assisted handwriting rehabilitation task. Unlike previous AAN controllers, our method does not rely on patient specific controller parameters or physical models. We propose the use of a virtual patient model to generalize AR3n across multiple subjects. The system modulates robotic assistance in realtime based on a subject's tracking error, while minimizing the amount of robotic assistance. The controller is experimentally validated through a set of simulations and human subject experiments. Finally, a comparative study with a traditional rule-based controller is conducted to analyze differences in assistance mechanisms of the two controllers.
△ Less
Submitted 16 April, 2023; v1 submitted 28 February, 2023;
originally announced March 2023.
-
Role of Temporal Diversity in Inferring Social Ties Based on Spatio-Temporal Data
Authors:
Deshana Desai,
Harsh Nisar,
Rishab Bhardawaj
Abstract:
The last two decades have seen a tremendous surge in research on social networks and their implications. The studies includes inferring social relationships, which in turn have been used for target advertising, recommendations, search customization etc. However, the offline experiences of human, the conversations with people and face-to-face interactions that govern our lives interactions have rec…
▽ More
The last two decades have seen a tremendous surge in research on social networks and their implications. The studies includes inferring social relationships, which in turn have been used for target advertising, recommendations, search customization etc. However, the offline experiences of human, the conversations with people and face-to-face interactions that govern our lives interactions have received lesser attention. We introduce DAIICT Spatio-Temporal Network (DSSN), a spatiotemporal dataset of 0.7 million data points of continuous location data logged at an interval of every 2 minutes by mobile phones of 46 subjects. Our research is focused at inferring relationship strength between students based on the spatiotemporal data and comparing the results with the self-reported data. In that pursuit we introduce Temporal Diversity, which we show to be superior in its contribution to predicting relationship strength than its counterparts. We also explore the evolving nature of Temporal Diversity with time. Our rich dataset opens various other avenues of research that require fine-grained location data with bounded movement of participants within a limited geographical area. The advantage of having a bounded geographical area such as a university campus is that it provides us with a microcosm of the real world, where each such geographic zone has an internal context and function and a high percentage of mobility is governed by schedules and time-tables. The bounded geographical region in addition to the age homogeneous population gives us a minute look into the active internal socialization of students in a university.
△ Less
Submitted 10 November, 2016;
originally announced November 2016.
-
Can Evolutionary Sampling Improve Bagged Ensembles?
Authors:
Harsh Nisar,
Bhanu Pratap Singh Rawat
Abstract:
Perturb and Combine (P&C) group of methods generate multiple versions of the predictor by perturbing the training set or construction and then combining them into a single predictor (Breiman, 1996b). The motive is to improve the accuracy in unstable classification and regression methods. One of the most well known method in this group is Bagging. Arcing or Adaptive Resampling and Combining methods…
▽ More
Perturb and Combine (P&C) group of methods generate multiple versions of the predictor by perturbing the training set or construction and then combining them into a single predictor (Breiman, 1996b). The motive is to improve the accuracy in unstable classification and regression methods. One of the most well known method in this group is Bagging. Arcing or Adaptive Resampling and Combining methods like AdaBoost are smarter variants of P&C methods. In this extended abstract, we lay the groundwork for a new family of methods under the P&C umbrella, known as Evolutionary Sampling (ES). We employ Evolutionary algorithms to suggest smarter sampling in both the feature space (sub-spaces) as well as training samples. We discuss multiple fitness functions to assess ensembles and empirically compare our performance against randomized sampling of training data and feature sub-spaces.
△ Less
Submitted 3 October, 2016;
originally announced October 2016.