-
Data Format Standardization and DICOM Integration for Hyperpolarized 13C MRI
Authors:
Ernesto Diaz,
Renuka Sriram,
Jeremy W. Gordon,
Avantika Sinha,
Xiaoxi Liu,
Sule Sahin,
Jason Crane,
Marram P Olson,
Hsin-Yu Chen,
Jenna Bernard,
Daniel B. Vigneron,
Zhen Jane Wang,
Duan Xu,
Peder E. Z. Larson
Abstract:
Hyperpolarized (HP) 13C MRI has shown promise as a valuable modality for in vivo measurements of metabolism and is currently in human trials at 15 research sites worldwide. With this growth it is important to adopt standardized data storage practices as it will allow sites to meaningfully compare data.
In this paper we (1) describe data that we believe should be stored and (2) demonstrate pipeli…
▽ More
Hyperpolarized (HP) 13C MRI has shown promise as a valuable modality for in vivo measurements of metabolism and is currently in human trials at 15 research sites worldwide. With this growth it is important to adopt standardized data storage practices as it will allow sites to meaningfully compare data.
In this paper we (1) describe data that we believe should be stored and (2) demonstrate pipelines and methods that utilize the Digital Imaging and Communications in Medicine (DICOM) standard. This includes proposing a set of minimum set of information that is specific to HP 13C MRI studies. We then show where the majority of these can be fit into existing DICOM Attributes, primarily via the "Contrast/Bolus" module.
We also demonstrate pipelines for utilizing DICOM for HP 13C MRI. DICOM is the most common standard for clinical medical image storage and provides the flexibility to accommodate the unique aspects of HP 13C MRI, including the HP agent information but also spectroscopic and metabolite dimensions. The pipelines shown include creating DICOM objects for studies on human and animal imaging systems with various pulse sequences. We also show a python-based method to efficiently modify DICOM objects to incorporate the unique HP 13C MRI information that is not captured by existing pipelines. Moreover, we propose best practices for HP 13C MRI data storage that will support future multi-site trials, research studies and technical developments of this imaging technique.
△ Less
Submitted 5 May, 2024;
originally announced May 2024.
-
LineConGraphs: Line Conversation Graphs for Effective Emotion Recognition using Graph Neural Networks
Authors:
Gokul S Krishnan,
Sarala Padi,
Craig S. Greenberg,
Balaraman Ravindran,
Dinesh Manoch,
Ram D. Sriram
Abstract:
Emotion Recognition in Conversations (ERC) is a critical aspect of affective computing, and it has many practical applications in healthcare, education, chatbots, and social media platforms. Earlier approaches for ERC analysis involved modeling both speaker and long-term contextual information using graph neural network architectures. However, it is ideal to deploy speaker-independent models for r…
▽ More
Emotion Recognition in Conversations (ERC) is a critical aspect of affective computing, and it has many practical applications in healthcare, education, chatbots, and social media platforms. Earlier approaches for ERC analysis involved modeling both speaker and long-term contextual information using graph neural network architectures. However, it is ideal to deploy speaker-independent models for real-world applications. Additionally, long context windows can potentially create confusion in recognizing the emotion of an utterance in a conversation. To overcome these limitations, we propose novel line conversation graph convolutional network (LineConGCN) and graph attention (LineConGAT) models for ERC analysis. These models are speaker-independent and built using a graph construction strategy for conversations -- line conversation graphs (LineConGraphs). The conversational context in LineConGraphs is short-term -- limited to one previous and future utterance, and speaker information is not part of the graph. We evaluate the performance of our proposed models on two benchmark datasets, IEMOCAP and MELD, and show that our LineConGAT model outperforms the state-of-the-art methods with an F1-score of 64.58% and 76.50%. Moreover, we demonstrate that embedding sentiment shift information into line conversation graphs further enhances the ERC performance in the case of GCN models.
△ Less
Submitted 4 December, 2023;
originally announced December 2023.
-
Foundation Metrics for Evaluating Effectiveness of Healthcare Conversations Powered by Generative AI
Authors:
Mahyar Abbasian,
Elahe Khatibi,
Iman Azimi,
David Oniani,
Zahra Shakeri Hossein Abad,
Alexander Thieme,
Ram Sriram,
Zhongqi Yang,
Yanshan Wang,
Bryant Lin,
Olivier Gevaert,
Li-Jia Li,
Ramesh Jain,
Amir M. Rahmani
Abstract:
Generative Artificial Intelligence is set to revolutionize healthcare delivery by transforming traditional patient care into a more personalized, efficient, and proactive process. Chatbots, serving as interactive conversational models, will probably drive this patient-centered transformation in healthcare. Through the provision of various services, including diagnosis, personalized lifestyle recom…
▽ More
Generative Artificial Intelligence is set to revolutionize healthcare delivery by transforming traditional patient care into a more personalized, efficient, and proactive process. Chatbots, serving as interactive conversational models, will probably drive this patient-centered transformation in healthcare. Through the provision of various services, including diagnosis, personalized lifestyle recommendations, and mental health support, the objective is to substantially augment patient health outcomes, all the while mitigating the workload burden on healthcare providers. The life-critical nature of healthcare applications necessitates establishing a unified and comprehensive set of evaluation metrics for conversational models. Existing evaluation metrics proposed for various generic large language models (LLMs) demonstrate a lack of comprehension regarding medical and health concepts and their significance in promoting patients' well-being. Moreover, these metrics neglect pivotal user-centered aspects, including trust-building, ethics, personalization, empathy, user comprehension, and emotional support. The purpose of this paper is to explore state-of-the-art LLM-based evaluation metrics that are specifically applicable to the assessment of interactive conversational models in healthcare. Subsequently, we present an comprehensive set of evaluation metrics designed to thoroughly assess the performance of healthcare chatbots from an end-user perspective. These metrics encompass an evaluation of language processing abilities, impact on real-world clinical tasks, and effectiveness in user-interactive conversations. Finally, we engage in a discussion concerning the challenges associated with defining and implementing these metrics, with particular emphasis on confounding factors such as the target audience, evaluation methods, and prompt techniques involved in the evaluation process.
△ Less
Submitted 28 February, 2024; v1 submitted 21 September, 2023;
originally announced September 2023.
-
Control of bow shock induced three-dimensional separation using bleed through holes
Authors:
Hemanth Chandravamsi,
Sourabh Bhardwaj,
K. Ramachandra,
R. Sriram
Abstract:
The unsteady three-dimensional separated flow on a wall induced by a square protrusion (approximately twice the local boundary layer thickness in width and height), and its control by means of passive suction through holes, is investigated using wind tunnel experiments at Mach $2.87$. The baseline flow without any control was characterized and compared against the cases with bleed. A bow-shaped se…
▽ More
The unsteady three-dimensional separated flow on a wall induced by a square protrusion (approximately twice the local boundary layer thickness in width and height), and its control by means of passive suction through holes, is investigated using wind tunnel experiments at Mach $2.87$. The baseline flow without any control was characterized and compared against the cases with bleed. A bow-shaped separation line on the wall with a mid-span separation length of $5.57δ$ from protrusion face was traced from oil-flow visualization. The averaged pressure distribution surveyed using static pressure ports placed on the wall has mapped plateau, high-pressure, and a low-pressure region in the separated flow, distinctive to three-dimensional interactions. Ten control configurations were tested with suction holes placed along mid-span in the different pressure zones. Significant spanwise `Mean Reduction in Separation Length' of up to $0.93δ$ was observed from oil-flow visualization. A comparison of observations from various control configurations suggested that bleeding the flow from the high-pressure region could in general delay the separation and reduce the bubble size. Further, time-resolved schlieren visualizations have confirmed reduction in both `mid-span separation length' and `shock-intermittent-region' with the introduction of suction in high-pressure region. Fourier and Proper Orthogonal Decomposition analysis done on the schlieren data has confirmed the presence of low-frequency separation-shock oscillations at Strouhal Numbers of order $10^{-2}$, both with and without control. Furthermore, the amplitudes of separation-shock oscillations in the spectrum were reduced with the introduction of suction simultaneously from two holes placed in high and low-pressure regions.
△ Less
Submitted 22 November, 2022;
originally announced November 2022.
-
MSVIPER: Improved Policy Distillation for Reinforcement-Learning-Based Robot Navigation
Authors:
Aaron M. Roth,
**g Liang,
Ram Sriram,
Elham Tabassi,
Dinesh Manocha
Abstract:
We present Multiple Scenario Verifiable Reinforcement Learning via Policy Extraction (MSVIPER), a new method for policy distillation to decision trees for improved robot navigation. MSVIPER learns an "expert" policy using any Reinforcement Learning (RL) technique involving learning a state-action map** and then uses imitation learning to learn a decision-tree policy from it. We demonstrate that…
▽ More
We present Multiple Scenario Verifiable Reinforcement Learning via Policy Extraction (MSVIPER), a new method for policy distillation to decision trees for improved robot navigation. MSVIPER learns an "expert" policy using any Reinforcement Learning (RL) technique involving learning a state-action map** and then uses imitation learning to learn a decision-tree policy from it. We demonstrate that MSVIPER results in efficient decision trees and can accurately mimic the behavior of the expert policy. Moreover, we present efficient policy distillation and tree-modification techniques that take advantage of the decision tree structure to allow improvements to a policy without retraining. We use our approach to improve the performance of RL-based robot navigation algorithms for indoor and outdoor scenes. We demonstrate the benefits in terms of reduced freezing and oscillation behaviors (by up to 95\% reduction) for mobile robots navigating among dynamic obstacles and reduced vibrations and oscillation (by up to 17\%) for outdoor robot navigation on complex, uneven terrains.
△ Less
Submitted 19 September, 2022;
originally announced September 2022.
-
Multimodal Emotion Recognition using Transfer Learning from Speaker Recognition and BERT-based models
Authors:
Sarala Padi,
Seyed Omid Sadjadi,
Dinesh Manocha,
Ram D. Sriram
Abstract:
Automatic emotion recognition plays a key role in computer-human interaction as it has the potential to enrich the next-generation artificial intelligence with emotional intelligence. It finds applications in customer and/or representative behavior analysis in call centers, gaming, personal assistants, and social robots, to mention a few. Therefore, there has been an increasing demand to develop r…
▽ More
Automatic emotion recognition plays a key role in computer-human interaction as it has the potential to enrich the next-generation artificial intelligence with emotional intelligence. It finds applications in customer and/or representative behavior analysis in call centers, gaming, personal assistants, and social robots, to mention a few. Therefore, there has been an increasing demand to develop robust automatic methods to analyze and recognize the various emotions. In this paper, we propose a neural network-based emotion recognition framework that uses a late fusion of transfer-learned and fine-tuned models from speech and text modalities. More specifically, we i) adapt a residual network (ResNet) based model trained on a large-scale speaker recognition task using transfer learning along with a spectrogram augmentation approach to recognize emotions from speech, and ii) use a fine-tuned bidirectional encoder representations from transformers (BERT) based model to represent and recognize emotions from the text. The proposed system then combines the ResNet and BERT-based model scores using a late fusion strategy to further improve the emotion recognition performance. The proposed multimodal solution addresses the data scarcity limitation in emotion recognition using transfer learning, data augmentation, and fine-tuning, thereby improving the generalization performance of the emotion recognition models. We evaluate the effectiveness of our proposed multimodal approach on the interactive emotional dyadic motion capture (IEMOCAP) dataset. Experimental results indicate that both audio and text-based models improve the emotion recognition performance and that the proposed multimodal solution achieves state-of-the-art results on the IEMOCAP benchmark.
△ Less
Submitted 15 February, 2022;
originally announced February 2022.
-
Unsteady pulsating flowfield over spiked axisymmetric forebody at hypersonic flows
Authors:
S. Mohammed Ibrahim,
R. Sriram,
S. K. Karthick,
G. Jagadeesh
Abstract:
The paper gives experimental observations on the hypersonic flow past an axisymmetric flat-face cylinder with a protruding sharp-tip spike at a freestream Mach number of $M_\infty = 8.16$ at two different freestream Reynolds numbers based on the base body diameter ($Re_D = 0.76 \times 10^6$, and $3.05 \times 10^6$). Furthermore, modal analysis is done on schlieren images to understand the flow dyn…
▽ More
The paper gives experimental observations on the hypersonic flow past an axisymmetric flat-face cylinder with a protruding sharp-tip spike at a freestream Mach number of $M_\infty = 8.16$ at two different freestream Reynolds numbers based on the base body diameter ($Re_D = 0.76 \times 10^6$, and $3.05 \times 10^6$). Furthermore, modal analysis is done on schlieren images to understand the flow dynamics parallel with the unsteady pressure measurements. The protruding spike of length to base body diameter ratio of $[l/D]=1$ creates a familiar unsteady flowfield called 'pulsation.' Pressure loading and fluctuation intensity at two different $Re_D$ cases are calculated. A maximum drop of 98.24\% is observed in both parameters between the high and low ReD cases. Based on the analysis, a difference in the pulsation characteristics are noticed, which arise from two vortical zones, each from a system of two `$λ$' shocks formed during the `collapse' phase ahead of the base body. The interaction of shedding vortices from the $λ$-shocks' triple-points, along with the rotating stationary waves, contributes to the asymmetric high-pressure loading and the observation of shock pulsation on the flat-face cylinder. The vortical interactions form the second dominant spatial mode with a temporal mode carrying a dimensionless frequency ($f_2D/u_\infty \approx 0.34$) almost twice that of the fundamental frequency ($f_1D/u_\infty \approx 0.17$). The observed frequencies are invariant irrespective of the ReD cases. However, for the high-frequency range, the spectral pressure decay is observed to follow an inverse and -7/3 law for the low and high $Re_D$ cases, respectively.
△ Less
Submitted 19 December, 2021; v1 submitted 5 November, 2021;
originally announced November 2021.
-
Improved Speech Emotion Recognition using Transfer Learning and Spectrogram Augmentation
Authors:
Sarala Padi,
Seyed Omid Sadjadi,
Dinesh Manocha,
Ram D. Sriram
Abstract:
Automatic speech emotion recognition (SER) is a challenging task that plays a crucial role in natural human-computer interaction. One of the main challenges in SER is data scarcity, i.e., insufficient amounts of carefully labeled data to build and fully explore complex deep learning models for emotion classification. This paper aims to address this challenge using a transfer learning strategy comb…
▽ More
Automatic speech emotion recognition (SER) is a challenging task that plays a crucial role in natural human-computer interaction. One of the main challenges in SER is data scarcity, i.e., insufficient amounts of carefully labeled data to build and fully explore complex deep learning models for emotion classification. This paper aims to address this challenge using a transfer learning strategy combined with spectrogram augmentation. Specifically, we propose a transfer learning approach that leverages a pre-trained residual network (ResNet) model including a statistics pooling layer from speaker recognition trained using large amounts of speaker-labeled data. The statistics pooling layer enables the model to efficiently process variable-length input, thereby eliminating the need for sequence truncation which is commonly used in SER systems. In addition, we adopt a spectrogram augmentation technique to generate additional training data samples by applying random time-frequency masks to log-mel spectrograms to mitigate overfitting and improve the generalization of emotion recognition models. We evaluate the effectiveness of our proposed approach on the interactive emotional dyadic motion capture (IEMOCAP) dataset. Experimental results indicate that the transfer learning and spectrogram augmentation approaches improve the SER performance, and when combined achieve state-of-the-art results.
△ Less
Submitted 16 August, 2021; v1 submitted 5 August, 2021;
originally announced August 2021.
-
Multi-Window Data Augmentation Approach for Speech Emotion Recognition
Authors:
Sarala Padi,
Dinesh Manocha,
Ram D. Sriram
Abstract:
We present a Multi-Window Data Augmentation (MWA-SER) approach for speech emotion recognition. MWA-SER is a unimodal approach that focuses on two key concepts; designing the speech augmentation method and building the deep learning model to recognize the underlying emotion of an audio signal. Our proposed multi-window augmentation approach generates additional data samples from the speech signal b…
▽ More
We present a Multi-Window Data Augmentation (MWA-SER) approach for speech emotion recognition. MWA-SER is a unimodal approach that focuses on two key concepts; designing the speech augmentation method and building the deep learning model to recognize the underlying emotion of an audio signal. Our proposed multi-window augmentation approach generates additional data samples from the speech signal by employing multiple window sizes in the audio feature extraction process. We show that our augmentation method, combined with a deep learning model, improves speech emotion recognition performance. We evaluate the performance of our approach on three benchmark datasets: IEMOCAP, SAVEE, and RAVDESS. We show that the multi-window model improves the SER performance and outperforms a single-window model. The notion of finding the best window size is an essential step in audio feature extraction. We perform extensive experimental evaluations to find the best window choice and explore the windowing effect for SER analysis.
△ Less
Submitted 15 February, 2022; v1 submitted 19 October, 2020;
originally announced October 2020.
-
Adaptive and Dynamic Wireless Routers with Smart Antenna for Power Management
Authors:
S. Venkata Krishnan,
R. Sriram,
N. Senthil Kumar
Abstract:
In the recent evolution of wireless technologies, the power management has been a worrying factor. In order to overcome the power shortage, steps are taken to find new kind of energy harvesting methods, power attenuation reduction methods and power saving techniques. Wireless routers even though consume not much of power, battery powered devices require a lot. Omni directional antenna embedded wit…
▽ More
In the recent evolution of wireless technologies, the power management has been a worrying factor. In order to overcome the power shortage, steps are taken to find new kind of energy harvesting methods, power attenuation reduction methods and power saving techniques. Wireless routers even though consume not much of power, battery powered devices require a lot. Omni directional antenna embedded with multiple antennae focusing the beam of radio wave signals in the direction of nodes with least transmission angle can be a solution for this problem which is called as "Smart Antenna". To reduce power maceration we are going for adaptive and dynamic transmission wherein the transmission angle of antennae is varied in accordance with the movement of nodes. Apart from saving the power considerably, it also improves the signal strength
△ Less
Submitted 21 February, 2012;
originally announced February 2012.