Search | arXiv e-print repository

Can LLMs Augment Low-Resource Reading Comprehension Datasets? Opportunities and Challenges

Authors: Vinay Samuel, Houda Aynaou, Arijit Ghosh Chowdhury, Karthik Venkat Ramanan, Aman Chadha

Abstract: Large Language Models (LLMs) have demonstrated impressive zero shot performance on a wide range of NLP tasks, demonstrating the ability to reason and apply commonsense. A relevant application is to use them for creating high quality synthetic datasets for downstream tasks. In this work, we probe whether GPT-4 can be used to augment existing extractive reading comprehension datasets. Automating dat… ▽ More Large Language Models (LLMs) have demonstrated impressive zero shot performance on a wide range of NLP tasks, demonstrating the ability to reason and apply commonsense. A relevant application is to use them for creating high quality synthetic datasets for downstream tasks. In this work, we probe whether GPT-4 can be used to augment existing extractive reading comprehension datasets. Automating data annotation processes has the potential to save large amounts of time, money and effort that goes into manually labelling datasets. In this paper, we evaluate the performance of GPT-4 as a replacement for human annotators for low resource reading comprehension tasks, by comparing performance after fine tuning, and the cost associated with annotation. This work serves to be the first analysis of LLMs as synthetic data augmenters for QA systems, highlighting the unique opportunities and challenges. Additionally, we release augmented versions of low resource datasets, that will allow the research community to create further benchmarks for evaluation of generated datasets. △ Less

Submitted 21 September, 2023; originally announced September 2023.

Comments: 5 pages, 1 figure, 3 tables

arXiv:2306.09175 [pdf, other]

A Novel Approach to Encode Two-Way Epistatic Interactions Between Single Nucleotide Polymorphisms

Authors: Nathaniel Gunter, Prashanthi Vemuri, Vijay Ramanan, Robel K Gebre

Abstract: Modelling gene-gene epistatic interactions when computing genetic risk scores is not a well-explored subfield of genetics and could have potential to improve risk stratification in practice. Though applications of machine learning (ML) show promise as an avenue of improvement for current genetic risk assesments, they frequently suffer from the problem of two many features and to little data. We pr… ▽ More Modelling gene-gene epistatic interactions when computing genetic risk scores is not a well-explored subfield of genetics and could have potential to improve risk stratification in practice. Though applications of machine learning (ML) show promise as an avenue of improvement for current genetic risk assesments, they frequently suffer from the problem of two many features and to little data. We propose a method that when combined with ML allows information from individual genetic contributors to be preserved while incorporating information on their interactions in a single feature. This allows second-order analysis, while simultaneously increasing the number of input features to ML models as little as possible. We presented three methods that can be utilized to account for genetic interactions. We found that interaction methods that preserved information from the constituent SNPs performed significantly better than the simplest interaction method. Since the currently available ML methods are able to account for complex interactions, utilizing raw SNP genotypes alone is sufficient because the simplest model outperforms all the interaction methods Given that understanding and accounting for epistatic interactions is one of the most promising avenues for increasing explained variability in heritable disease, this work represents a first step toward an algorithmic interaction method that preserves the information in each component. This is relevant not only because of potential improvements in model quality, but also because explicit interaction terms allow a human readable interpretation of potential interaction pathways within the disease. △ Less

Submitted 15 June, 2023; originally announced June 2023.

arXiv:2110.01659 [pdf, other]

Cross-Modal Virtual Sensing for Combustion Instability Monitoring

Authors: Tryambak Gangopadhyay, Vikram Ramanan, Satyanarayanan R Chakravarthy, Soumik Sarkar

Abstract: In many cyber-physical systems, imaging can be an important but expensive or 'difficult to deploy' sensing modality. One such example is detecting combustion instability using flame images, where deep learning frameworks have demonstrated state-of-the-art performance. The proposed frameworks are also shown to be quite trustworthy such that domain experts can have sufficient confidence to use these… ▽ More In many cyber-physical systems, imaging can be an important but expensive or 'difficult to deploy' sensing modality. One such example is detecting combustion instability using flame images, where deep learning frameworks have demonstrated state-of-the-art performance. The proposed frameworks are also shown to be quite trustworthy such that domain experts can have sufficient confidence to use these models in real systems to prevent unwanted incidents. However, flame imaging is not a common sensing modality in engine combustors today. Therefore, the current roadblock exists on the hardware side regarding the acquisition and processing of high-volume flame images. On the other hand, the acoustic pressure time series is a more feasible modality for data collection in real combustors. To utilize acoustic time series as a sensing modality, we propose a novel cross-modal encoder-decoder architecture that can reconstruct cross-modal visual features from acoustic pressure time series in combustion systems. With the "distillation" of cross-modal features, the results demonstrate that the detection accuracy can be enhanced using the virtual visual sensing modality. By providing the benefit of cross-modal reconstruction, our framework can prove to be useful in different domains well beyond the power generation and transportation industries. △ Less

Submitted 6 October, 2021; v1 submitted 4 October, 2021; originally announced October 2021.

arXiv:2101.01877 [pdf, other]

3D Convolutional Selective Autoencoder For Instability Detection in Combustion Systems

Authors: Tryambak Gangopadhyay, Vikram Ramanan, Adedotun Akintayo, Paige K Boor, Soumalya Sarkar, Satyanarayanan R Chakravarthy, Soumik Sarkar

Abstract: While analytical solutions of critical (phase) transitions in physical systems are abundant for simple nonlinear systems, such analysis remains intractable for real-life dynamical systems. A key example of such a physical system is thermoacoustic instability in combustion, where prediction or early detection of an onset of instability is a hard technical challenge, which needs to be addressed to b… ▽ More While analytical solutions of critical (phase) transitions in physical systems are abundant for simple nonlinear systems, such analysis remains intractable for real-life dynamical systems. A key example of such a physical system is thermoacoustic instability in combustion, where prediction or early detection of an onset of instability is a hard technical challenge, which needs to be addressed to build safer and more energy-efficient gas turbine engines powering aerospace and energy industries. The instabilities arising in combustion chambers of engines are mathematically too complex to model. To address this issue in a data-driven manner instead, we propose a novel deep learning architecture called 3D convolutional selective autoencoder (3D-CSAE) to detect the evolution of self-excited oscillations using spatiotemporal data, i.e., hi-speed videos taken from a swirl-stabilized combustor (laboratory surrogate of gas turbine engine combustor). 3D-CSAE consists of filters to learn, in a hierarchical fashion, the complex visual and dynamic features related to combustion instability. We train the 3D-CSAE on frames of videos obtained from a limited set of operating conditions. We select the 3D-CSAE hyper-parameters that are effective for characterizing hierarchical and multiscale instability structure evolution by utilizing the dynamic information available in the video. The proposed model clearly shows performance improvement in detecting the precursors of instability. The machine learning-driven results are verified with physics-based off-line measures. Advanced active control mechanisms can directly leverage the proposed online detection capability of 3D-CSAE to mitigate the adverse effects of combustion instabilities on the engine operating under various stringent requirements and conditions. △ Less

Submitted 6 January, 2021; originally announced January 2021.

arXiv:1708.04577 [pdf, other]

doi 10.1371/journal.pcbi.1005939

Interactions between species introduce spurious associations in microbiome studies

Authors: Rajita Menon, Vivek Ramanan, Kirill S. Korolev

Abstract: Microbiota contribute to many dimensions of host phenotype, including disease. To link specific microbes to specific phenotypes, microbiome-wide association studies compare microbial abundances between two groups of samples. Abundance differences, however, reflect not only direct associations with the phenotype, but also indirect effects due to microbial interactions. We found that microbial inter… ▽ More Microbiota contribute to many dimensions of host phenotype, including disease. To link specific microbes to specific phenotypes, microbiome-wide association studies compare microbial abundances between two groups of samples. Abundance differences, however, reflect not only direct associations with the phenotype, but also indirect effects due to microbial interactions. We found that microbial interactions could easily generate a large number of spurious associations that provide no mechanistic insight. Using techniques from statistical physics, we developed a method to remove indirect associations and applied it to the largest dataset on pediatric inflammatory bowel disease. Our method corrected the inflation of p-values in standard association tests and showed that only a small subset of associations is directly linked to the disease. Direct associations had a much higher accuracy in separating cases from controls and pointed to immunomodulation, butyrate production, and the brain-gut axis as important factors in the inflammatory bowel disease. △ Less

Submitted 30 January, 2018; v1 submitted 15 August, 2017; originally announced August 2017.

Comments: 4 main text figures, 15 supplementary figures (i.e appendix) and 6 supplementary tables. Overall 49 pages including references

Journal ref: 2018. PLoS Comput Biol 14(1): e1005939

arXiv:1609.09254 [pdf, other]

doi 10.1142/S2339547816500114

Integrated bio-electrochemical model for a micro photosynthetic power cell

Authors: Hemanth Kumar Tanneru, Resmi Suresh M. P, Aravind Vyas Ramanan, Shahparnia. M, Muthukumaran Packirisamy, Pragasen Pillay, Sheldon Williamson, Philippe Juneau, Raghunathan Rengaswamy

Abstract: A simple first-principles mathematical model is developed to predict the performance of a micro photosynthetic power cell ($μ$PSC), an electrochemical device which generates electricity by harnessing electrons from photosynthesis in the presence of light. A lumped parameter approach is used to develop a model in which the electrochemical kinetic rate constants and diffusion effects are lumped into… ▽ More A simple first-principles mathematical model is developed to predict the performance of a micro photosynthetic power cell ($μ$PSC), an electrochemical device which generates electricity by harnessing electrons from photosynthesis in the presence of light. A lumped parameter approach is used to develop a model in which the electrochemical kinetic rate constants and diffusion effects are lumped into a single characteristic rate constant $K$. A non-parametric estimation of $K$ for the $μ$PSC is performed by minimizing the sum square errors (SSE) between the experimental and model predicted current and voltages. The developed model is validated by comparing the model predicted $v-i$ characteristics with experimental data not used in the parameter estimation. Sensitivity analysis of the design parameters and the operational parameters reveal interesting insights for performance enhancement. Analysis of the model also suggests that there are two different operating regimes that are observed in this $μ$PSC. This modeling approach can be used in other designs of $μ$PSCs for performance enhancement studies. △ Less

Submitted 29 September, 2016; originally announced September 2016.

Comments: 9 pages,11 figures, sunbmitted to JMEMS

Journal ref: Micro photosynthetic cell for power generation from algae: Bio-electrochemical modeling and verification 2016

Showing 1–6 of 6 results for author: Ramanan, V